An in-the-wild benchmark for AI agents in the OpenClaw Environment.
☆453Jun 25, 2026Updated this week
Alternatives and similar repositories for WildClawBench
Users that are interested in WildClawBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache☆62Jan 26, 2026Updated 5 months ago
- [CVPR 2026] An official implementation of "Think Visually, Reason Textually: Vision-Language Synergy in ARC"☆46Nov 26, 2025Updated 7 months ago
- [ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆207Apr 13, 2026Updated 2 months ago
- SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement☆65Jun 11, 2026Updated 2 weeks ago
- survery of small language models☆18Jul 23, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Proactive security monitoring for OpenClaw deployments. Detects ClawHavoc, AMOS stealer, CVE-2026-25253, memory poisoning, and supply cha…☆47Updated this week
- Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks☆79May 7, 2026Updated last month
- [ICML 2026] InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem☆25Jun 21, 2026Updated last week
- Repository for SoMeLVLM: A Large Vision Language Model for Social Media Processing☆14Oct 9, 2025Updated 8 months ago
- [ICLR 2025] FLAT: LLM Unlearning via Loss Adjustment with Only Forget Data☆14Feb 26, 2025Updated last year
- OPSTL: Self-supervised Skeleton-based Action Recognition in Occluded Environments☆14Oct 25, 2023Updated 2 years ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆60Jan 26, 2026Updated 5 months ago
- 🔥🔥🔥 Detecting hidden backdoors in Large Language Models with only black-box access☆58Jun 2, 2025Updated last year
- ☆27Jan 5, 2026Updated 5 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Official codebase of the paper -- Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills☆162May 1, 2026Updated last month
- Implement of Implicit Knowledge Extraction Attack.☆23Apr 17, 2026Updated 2 months ago
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆23Mar 4, 2025Updated last year
- [CVPR 2025] Official implementation of ByTheWay: Boost Your Text-to-Video Generation Model to Higher Quality in a Training-free Way☆48Oct 10, 2025Updated 8 months ago
- This is the implementation for IEEE S&P 2022 paper "Model Orthogonalization: Class Distance Hardening in Neural Networks for Better Secur…☆11Aug 24, 2022Updated 3 years ago
- daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently☆39Feb 4, 2026Updated 4 months ago
- ☆27Oct 27, 2025Updated 8 months ago
- arXiv 2024 | ZIP: entropy-law data selection for efficient LLM alignment.☆28Jun 10, 2026Updated 2 weeks ago
- Official implementation of SIGIR 2022 Paper "Task-Oriented Dialogue System as Natural Language Generation".☆14Apr 6, 2022Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Focused Papers, Delivered Simply :)☆55Dec 25, 2025Updated 6 months ago
- Token-level adaptation of LoRA matrices for downstream task generalization.☆15Apr 14, 2024Updated 2 years ago
- Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.☆684May 17, 2026Updated last month
- Source code for NeurIPS 2019 paper "Learning Latent Processes from High-Dimensional Event Sequences via Efficient Sampling""☆10Mar 20, 2021Updated 5 years ago
- Multi-encoder segmentation for contrail detection in satellite imagery | Google Researc☆12Jan 28, 2026Updated 5 months ago
- Extending context length of visual language models☆12Dec 18, 2024Updated last year
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆19Feb 29, 2024Updated 2 years ago
- Official implementation of the benchmarked 2D, 3D classficiation, and 3D semantic segmentation models on PeRFception.☆14Jan 21, 2023Updated 3 years ago
- Repo for Anonymous purpose, pls don't distribute☆10Oct 2, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Benchmarking LLMs and Agents in Rigorous Financial Analysis and Forecast☆28May 10, 2026Updated last month
- ECNU 校园网定时自动登录☆14Jul 24, 2024Updated last year
- ☆13Oct 31, 2024Updated last year
- This is the official code repository for the paper: Towards General Continuous Memory for Vision-Language Models.☆31Jul 3, 2025Updated 11 months ago
- ☆10Aug 19, 2023Updated 2 years ago
- ☆33Mar 16, 2025Updated last year
- ☆28Apr 14, 2025Updated last year