fudan-zvg / S-Agents
Official repository of S-Agents: Self-organizing Agents in Open-ended Environment
☆25Updated last year
Alternatives and similar repositories for S-Agents:
Users that are interested in S-Agents are comparing it to the libraries listed below
- Official implementation of "Self-Improving Video Generation"☆62Updated last month
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆22Updated 4 months ago
- ☆33Updated last year
- ☆38Updated 3 months ago
- [NeurIPSw'24] This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simu…☆87Updated 2 months ago
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆29Updated 9 months ago
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR 2025)☆39Updated last week
- A Video Tokenizer Evaluation Dataset☆112Updated 3 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆42Updated last month
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆36Updated last year
- Official repo for StableLLAVA☆95Updated last year
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 9 months ago
- ☆33Updated 2 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆56Updated this week
- ☆27Updated 9 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 9 months ago
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆62Updated this week
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆34Updated 9 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆42Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆69Updated last month
- Learning to Identify Critical States for Reinforcement Learning from Videos (Accepted to ICCV'23)☆26Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆60Updated 9 months ago
- Multimodal RewardBench☆38Updated 2 months ago
- Language Repository for Long Video Understanding☆31Updated 10 months ago
- Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"☆69Updated 3 weeks ago
- ElasticTok: Adaptive Tokenization for Image and Video☆66Updated 5 months ago
- ☆29Updated 2 months ago
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆55Updated 2 months ago
- ☆38Updated last year