fudan-zvg / S-AgentsLinks

Official repository of S-Agents: Self-organizing Agents in Open-ended Environment

☆26

Alternatives and similar repositories for S-Agents

Users that are interested in S-Agents are comparing it to the libraries listed below

Sorting:

Zhoues / MineDreamer
[IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simula…
☆97Updated 5 months ago
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆103Updated 2 years ago
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
☆77Updated 4 months ago
AILab-CVC / VL-GPT
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
☆86Updated last year
zeyofu / BLINK_Benchmark
This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…
☆149Updated last month
aszala / EnvGen
Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)
☆38Updated last year
si0wang / VisVM
☆46Updated 10 months ago
PatrickHua / Awesome-World-Models
This repository is a collection of research papers on World Models.
☆42Updated 2 years ago
allenai / unified-io-2.pytorch
☆78Updated last year
OpenGVLab / EmbodiedGPT
☆33Updated 2 years ago
Chenyu-Wang567 / MLLM-Tool
MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning
☆134Updated last month
zzxslp / SoM-LLaVA
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
☆145Updated last year
Video-as-Agent / VideoAgent
Official implementation of "Self-Improving Video Generation"
☆75Updated 6 months ago
CraftJarvis / JarvisVLA
Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"
☆107Updated 2 months ago
kaiyuyue / nxtp
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆181Updated 6 months ago
facebookresearch / VLaMP
Code for “Pretrained Language Models as Visual Planners for Human Assistance”
☆61Updated 2 years ago
kkahatapitiya / LangRepo
Code for our ACL 2025 paper "Language Repository for Long Video Understanding"
☆32Updated last year
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆94Updated last year
rese1f / STEVE
[ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment
☆39Updated last year
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆148Updated last year
CraftJarvis / ROCKET-1
Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR'25)
☆45Updated 7 months ago
chenllliang / G1
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
☆88Updated 6 months ago
Beckschen / LLaVolta
[NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression
☆61Updated 9 months ago
facebookresearch / multimodal_rewardbench
Multimodal RewardBench
☆54Updated 9 months ago
isekai-portal / Link-Context-Learning
☆100Updated last year
VIRL-Platform / VIRL
(ECCV 2024) Code for V-IRL: Grounding Virtual Intelligence in Real Life
☆364Updated 11 months ago
UX-Decoder / FIND
[NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"
☆128Updated last year
IranQin / MP5
[CVPR2024] This is the official implement of MP5
☆106Updated last year
MengLcool / DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆66Updated last year
ChenYi99 / EgoPlan
[IJCV] EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
☆74Updated 11 months ago