An in-the-wild benchmark for AI agents in the OpenClaw Environment.
☆364May 14, 2026Updated this week
Alternatives and similar repositories for WildClawBench
Users that are interested in WildClawBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [CVPR 2026] An official implementation of "Think Visually, Reason Textually: Vision-Language Synergy in ARC"☆41Nov 26, 2025Updated 5 months ago
- [ICML 2026] InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem☆22Apr 7, 2026Updated last month
- [ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆204Apr 13, 2026Updated last month
- Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks☆72May 7, 2026Updated last week
- survery of small language models☆18Jul 23, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following☆122Feb 13, 2026Updated 3 months ago
- Repository for SoMeLVLM: A Large Vision Language Model for Social Media Processing☆14Oct 9, 2025Updated 7 months ago
- [CVPR 2026 Oral] A training-free, mask-free framework for 3D shape editing.☆38May 9, 2026Updated last week
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Jan 21, 2025Updated last year
- 🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access☆55Jun 2, 2025Updated 11 months ago
- ☆33May 27, 2025Updated 11 months ago
- 南京大学小百合BBS部分数据归档(截至2020年7月初),来源网址:http://bbs.nju.edu.cn/☆17Nov 2, 2020Updated 5 years ago
- Implement of Implicit Knowledge Extraction Attack.☆23Apr 17, 2026Updated last month
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆23Mar 4, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.☆563May 8, 2026Updated last week
- This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…☆11Aug 24, 2022Updated 3 years ago
- ☆27Oct 27, 2025Updated 6 months ago
- daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently☆39Feb 4, 2026Updated 3 months ago
- fastNLP reimplementation of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction"☆11Dec 11, 2020Updated 5 years ago
- ☆28Jul 11, 2024Updated last year
- Official implementation of SIGIR 2022 Paper "Task-Oriented Dialogue System as Natural Language Generation".☆14Apr 6, 2022Updated 4 years ago
- Focused Papers, Delivered Simply :)☆55Dec 25, 2025Updated 4 months ago
- Official release of code for the paper RL is a hammer and LLMs are nails A simple RL approach to stronger prompt injection attacks☆48May 6, 2026Updated last week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A simple 2D ball collision engine.☆12Jun 15, 2023Updated 2 years ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Official implementation of the benchmarked 2D, 3D classficiation, and 3D semantic segmentation models on PeRFception.☆14Jan 21, 2023Updated 3 years ago
- Repo for Anonymous purpose, pls don't distribute☆10Oct 2, 2024Updated last year
- ☆10Aug 19, 2023Updated 2 years ago
- Code and data for "Medical Dialogue Generation via Dual Flow Modeling" (ACL 2023 Findings)☆14Nov 22, 2023Updated 2 years ago
- ☆29Mar 16, 2025Updated last year
- ☆12May 27, 2022Updated 3 years ago
- Universal preflight security scanner for AI coding agents — Detects hooks injection, credential exfiltration & backdoors in .cursorrules,…☆70Apr 9, 2026Updated last month
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆34Oct 21, 2025Updated 6 months ago
- 💻 SETA: Scaling Environments for Terminal Agents - Environments☆135Feb 16, 2026Updated 3 months ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆23Apr 10, 2026Updated last month
- A package for fine tuning of pretrained NLP transformers using Semi Supervised Learning☆14Oct 27, 2021Updated 4 years ago
- The OlymMATH dataset☆24Jun 1, 2025Updated 11 months ago
- Code for our Bioinformatics 2022 paper: "DxFormer: A Decoupled Automatic Diagnostic System Based on Decoder-Encoder Transformer with Dens…☆11Dec 24, 2022Updated 3 years ago
- Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing☆14Feb 18, 2021Updated 5 years ago