An in-the-wild benchmark for AI agents in the OpenClaw Environment.
☆318Apr 21, 2026Updated last week
Alternatives and similar repositories for WildClawBench
Users that are interested in WildClawBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem☆21Apr 7, 2026Updated 3 weeks ago
- Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks☆63Apr 8, 2026Updated 3 weeks ago
- [AAAI 2026] This is the official implementation of the paper "ExtendAttack: Attacking Servers of LRMs via Extending Reasoning".☆22Mar 18, 2026Updated last month
- [CVPR 2026 Oral] A training-free, mask-free framework for 3D shape editing.☆31Apr 6, 2026Updated 3 weeks ago
- [ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following☆122Feb 13, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- OPSTL: Self-supervised Skeleton-based Action Recognition in Occluded Environments☆14Oct 25, 2023Updated 2 years ago
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Jan 21, 2025Updated last year
- ☆61Apr 7, 2026Updated 3 weeks ago
- Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.☆487Updated this week
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆59Jan 26, 2026Updated 3 months ago
- 🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access☆55Jun 2, 2025Updated 10 months ago
- ☆23Jan 5, 2026Updated 3 months ago
- ☆33May 27, 2025Updated 11 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Implement of Implicit Knowledge Extraction Attack.☆22Apr 17, 2026Updated last week
- [CVPR 2025] Official implementation of ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way☆48Oct 10, 2025Updated 6 months ago
- Official implementation of SIGIR 2022 Paper "Task-Oriented Dialogue System as Natural Language Generation".☆14Apr 6, 2022Updated 4 years ago
- Focused Papers, Delivered Simply :)☆55Dec 25, 2025Updated 4 months ago
- Benchmarking LLMs and Agents in Rigorous Financial Analysis and Forecast☆23Jan 9, 2026Updated 3 months ago
- Official release of code for the paper RL is a hammer and LLMs are nails A simple RL approach to stronger prompt injection attacks☆45Apr 13, 2026Updated 2 weeks ago
- Multi-encoder segmentation for contrail detection in satellite imagery | Google Researc☆12Jan 28, 2026Updated 3 months ago
- A simple 2D ball collision engine.☆12Jun 15, 2023Updated 2 years ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆19Feb 29, 2024Updated 2 years ago
- Repo for Anonymous purpose, pls don't distribute☆10Oct 2, 2024Updated last year
- ECNU 校园网定时自动登录☆14Jul 24, 2024Updated last year
- This is the official code repository for the paper: Towards General Continuous Memory for Vision-Language Models.☆26Jul 3, 2025Updated 9 months ago
- ☆10Aug 19, 2023Updated 2 years ago
- Code and data for "Medical Dialogue Generation via Dual Flow Modeling" (ACL 2023 Findings)☆14Nov 22, 2023Updated 2 years ago
- ☆27Apr 14, 2025Updated last year
- Autoresearch for LLM adversarial attacks☆206Apr 10, 2026Updated 2 weeks ago
- ☆14Oct 11, 2023Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆12May 27, 2022Updated 3 years ago
- 💻 SETA: Scaling Environments for Terminal Agents - Environments☆129Feb 16, 2026Updated 2 months ago
- Universal preflight security scanner for AI coding agents — Detects hooks injection, credential exfiltration & backdoors in .cursorrules,…☆68Apr 9, 2026Updated 2 weeks ago
- ☆34Oct 21, 2025Updated 6 months ago
- ☆45Oct 12, 2025Updated 6 months ago
- The OlymMATH dataset☆24Jun 1, 2025Updated 10 months ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆22Apr 10, 2026Updated 2 weeks ago