InternLM/WildClawBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/InternLM/WildClawBench)

InternLM / WildClawBench

An in-the-wild benchmark for AI agents in the OpenClaw Environment.

☆480

Alternatives and similar repositories for WildClawBench

Users that are interested in WildClawBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

claw-eval / claw-eval
View on GitHub
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
☆726May 17, 2026Updated 2 months ago
InternLM / EndoCoT
View on GitHub
[ECCV 2026] An official implementation of "EndoCoT". Scaling endogenous Chain-of-Thought (CoT) reasoning in diffusion models for complex …
☆43Jun 26, 2026Updated 3 weeks ago
EnigmaYYYY / SocialClaw
View on GitHub
SocialClaw is a screen-aware social copilot that watches live chat windows, builds personalized memory and profile context, and suggests …
☆40Apr 9, 2026Updated 3 months ago
Cooperx521 / ScaleCap
View on GitHub
(ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆60Jan 26, 2026Updated 5 months ago
InternLM / ETCHR
View on GitHub
A question-conditioned, reasoning-aware image editor designed to serve as a decoupled visual reasoning assistant for Multimodal Large Lan…
☆23May 25, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
InternLM / StarBench
View on GitHub
[ICLR 2026] An official implementation of "STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence"
☆42Apr 19, 2026Updated 3 months ago
Maplebb / LoMo
View on GitHub
Offline implementation of LoMo: Local Modality Substitution for Deeper Vision-Language Fusion.
☆25Jun 1, 2026Updated last month
SKYLENAGE-AI / QwenClawBench
View on GitHub
General Agent Benchmark for OpenClaw, made by Qwen Team, Alibaba Group.
☆58Jun 10, 2026Updated last month
InternLM / Spark
View on GitHub
An official implementation of "SPARK: Synergistic Policy And Reward Co-Evolving Framework"
☆25Oct 23, 2025Updated 8 months ago
OpenEvaluation / VLMEvalKit
View on GitHub
☆23Apr 11, 2026Updated 3 months ago
InternLM / Visual-ERM
View on GitHub
Official Implementation of "Visual-ERM: Reward Modeling for Visual Equivalence"
☆64Mar 23, 2026Updated 3 months ago
OpenIXCLab / CODA
View on GitHub
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
☆37Aug 28, 2025Updated 10 months ago
evolvent-ai / ClawMark
View on GitHub
🦞 ClawMark: A Living-World Benchmark for Multi-Day, Multimodal Coworker Agents
☆116May 28, 2026Updated last month
InternLM / CapRL
View on GitHub
[ICLR 2026] An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"
☆225Jun 23, 2026Updated 3 weeks ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
InternLM / Spatial-SSRL
View on GitHub
[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆133Apr 7, 2026Updated 3 months ago
SYuan03 / MM-IFEngine
View on GitHub
[ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following
☆124Feb 13, 2026Updated 5 months ago
beichenzbc / BoostStep
View on GitHub
official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"
☆37Jan 21, 2025Updated last year
open-compass / Creation-MMBench
View on GitHub
Assessing Context-Aware Creative Intelligence in MLLMs
☆23Jul 22, 2025Updated 11 months ago
Liuziyu77 / MIA-DPO
View on GitHub
Official implement of MIA-DPO
☆69Jan 23, 2025Updated last year
yjyddq / EOSER-ASS-RL
View on GitHub
Official Repository of "Taming Masked Diffusion Language Models via Consistency Trajectory Reinforcement Learning with Fewer Decoding Ste…
☆28Mar 9, 2026Updated 4 months ago
sqs-ustc / tool-reasoning-framework-PTE
View on GitHub
☆38Jan 1, 2026Updated 6 months ago
InternLM / ARM-Thinker
View on GitHub
[CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
☆194Feb 13, 2026Updated 5 months ago
JiaranI / mihomo-upstream-proxy-setup
View on GitHub
☆45Mar 30, 2026Updated 3 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
pinchbench / skill
View on GitHub
PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai
☆1,285Jul 2, 2026Updated 2 weeks ago
KnowledgeXLab / skill-git
View on GitHub
Supercharge your AI agents by versioning, tracking, and merging overlapping skills.
☆40Apr 9, 2026Updated 3 months ago
benchflow-ai / skillsbench
View on GitHub
SkillsBench evaluates how well skills work and how effective agents are at using them.
☆1,541Updated this week
ClawGym / ClawGym-Bench
View on GitHub
☆18May 15, 2026Updated 2 months ago
ALEX-nlp / DenoiseRL
View on GitHub
DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes
☆36Updated this week
Gen-Verse / OpenClaw-RL
View on GitHub
OpenClaw-RL: Train any agent simply by talking
☆5,588May 23, 2026Updated last month
OpenDataBox / Workspace-Bench
View on GitHub
Benchmark self-evolving Agent upon realistic large-scale file workspaces
☆43Updated this week
Mark12Ding / Dispider
View on GitHub
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
☆180Mar 23, 2025Updated last year
bcmi / Granular-GRPO
View on GitHub
[CVPR 2026] Fine-Grained GRPO for Precise Preference Alignment in Flow Models
☆64Jun 1, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
shikiw / Modality-Integration-Rate
View on GitHub
[ICCV 2025] The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration R…
☆113Jul 9, 2025Updated last year
Liuziyu77 / gene-skill
View on GitHub
Gene-skill: Throw a few Skills into the “gene blender” and shake out a new Skill that gets more done.
☆59Apr 17, 2026Updated 3 months ago
InternLM / OVO-S-Bench
View on GitHub
An official implementation of "OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs"
☆47Jun 24, 2026Updated 3 weeks ago
Li-Jinsong / DAEDAL
View on GitHub
[ICLR 2026] Official repository of "Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models"
☆173Feb 16, 2026Updated 5 months ago
GAIR-NLP / daVinci-Agency
View on GitHub
daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently
☆38Feb 4, 2026Updated 5 months ago
InternLM / SIM-CoT
View on GitHub
[ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"
☆212Apr 13, 2026Updated 3 months ago
SunzeY / SEAgent
View on GitHub
[ICML-2026] Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
☆257Aug 7, 2025Updated 11 months ago