fudan-zvg / S-Agents
Official repository of S-Agents: Self-organizing Agents in Open-ended Environment
☆17Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for S-Agents
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆29Updated last month
- ☆58Updated last month
- This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Co…☆71Updated 4 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated last month
- SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enh…☆25Updated 2 months ago
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆65Updated 4 months ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆31Updated last month
- VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆120Updated 2 weeks ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆36Updated last year
- [CVPR2024] This is the official implement of MP5☆83Updated 4 months ago
- The paper collections for the autoregressive models in vision.☆95Updated this week
- This is a repo to track the latest autoregressive visual generation papers.☆41Updated 3 weeks ago
- ☆33Updated last year
- ☆27Updated 4 months ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆59Updated last month
- Paper List for In-context Learning 🌷☆20Updated last year
- Official implementation of "Self-Improving Video Generation"☆47Updated this week
- Official repository for "iVideoGPT: Interactive VideoGPTs are Scalable World Models" (NeurIPS 2024), https://arxiv.org/abs/2405.15223☆68Updated last week
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆41Updated last week
- ☆64Updated 4 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆44Updated this week
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆91Updated last year
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆40Updated last week
- Official implement of MIA-DPO☆32Updated this week
- Official implementation for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆42Updated last year
- The official implementation for "MonoFormer: One Transformer for Both Diffusion and Autoregression"☆76Updated 3 weeks ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆75Updated 2 months ago
- Egocentric Video Understanding Dataset (EVUD)☆23Updated 4 months ago
- ElasticTok: Adaptive Tokenization for Image and Video☆31Updated this week