pinchbench/skill

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pinchbench/skill)

pinchbench / skill

PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

☆1,299

Alternatives and similar repositories for skill

Users that are interested in skill are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

claw-eval / claw-eval
View on GitHub
Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.
☆733May 17, 2026Updated 2 months ago
InternLM / WildClawBench
View on GitHub
An in-the-wild benchmark for AI agents in the OpenClaw Environment.
☆484Updated this week
SKYLENAGE-AI / QwenClawBench
View on GitHub
General Agent Benchmark for OpenClaw, made by Qwen Team, Alibaba Group.
☆59Jun 10, 2026Updated last month
benchflow-ai / skillsbench
View on GitHub
SkillsBench evaluates how well skills work and how effective agents are at using them.
☆1,575Updated this week
Gen-Verse / OpenClaw-RL
View on GitHub
OpenClaw-RL: Train any agent simply by talking
☆5,606May 23, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
hkust-nlp / Toolathlon
View on GitHub
[ICLR 2026] The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution
☆438Updated this week
harbor-framework / terminal-bench
View on GitHub
A benchmark for LLMs on complicated tasks in the terminal
☆2,482Jul 11, 2026Updated 2 weeks ago
openclaw / clawhub
View on GitHub
Skill + Plugin Registry for OpenClaw
☆9,213Updated this week
Martian-Engineering / lossless-claw
View on GitHub
Lossless Claw — LCM (Lossless Context Management) plugin for OpenClaw
☆4,894Updated this week
harbor-framework / harbor
View on GitHub
Framework for evaluating and improving agents
☆3,475Updated this week
sierra-research / tau2-bench
View on GitHub
τ-Bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
☆1,657Updated this week
ClawGym / ClawGym-Agents
View on GitHub
☆33Jun 30, 2026Updated 3 weeks ago
VoltAgent / awesome-openclaw-skills
View on GitHub
The awesome collection of OpenClaw skills. 5,400+ skills filtered and categorized from the official OpenClaw Skills Registry.🦞
☆51,492Jul 16, 2026Updated last week
openclaw / openclaw
View on GitHub
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
☆384,013Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SWE-bench / SWE-bench
View on GitHub
SWE-bench: Can Language Models Resolve Real-world Github Issues?
☆5,482Apr 1, 2026Updated 3 months ago
aiming-lab / MetaClaw
View on GitHub
🦞 Just talk to your agent — it learns and EVOLVES 🧬.
☆3,473Jun 7, 2026Updated last month
NousResearch / hermes-agent
View on GitHub
The agent that grows with you
☆219,878Updated this week
Ayanami0730 / deep_research_bench
View on GitHub
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
☆797May 11, 2026Updated 2 months ago
bytedance / deer-flow
View on GitHub
An open-source long-horizon SuperAgent harness that researches, codes, and creates. With the help of sandboxes, memories, tools, skill, s…
☆77,785Updated this week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,649Updated this week
GAIR-NLP / AgencyBench
View on GitHub
[ACL2026 Main] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts
☆89Jan 23, 2026Updated 6 months ago
hesamsheikh / awesome-openclaw-usecases
View on GitHub
A community collection of OpenClaw use cases for making life easier.
☆31,532Mar 24, 2026Updated 4 months ago
HKUDS / CLI-Anything
View on GitHub
"CLI-Anything: Making ALL Software Agent-Native" -- CLI-Hub: https://clianything.cc/
☆46,000Jul 9, 2026Updated 2 weeks ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
HKUDS / nanobot
View on GitHub
Lightweight, open-source AI agent for your tools, chats, and workflows.
☆46,189Updated this week
areal-project / AReaL
View on GitHub
The RL Bridge for LLM-based Agent Applications. Made Simple & Flexible.
☆5,596Updated this week
rllm-org / rllm
View on GitHub
Democratizing Reinforcement Learning for LLMs
☆5,727Updated this week
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,621Updated this week
anthropics / skills
View on GitHub
Public repository for Agent Skills
☆163,902Updated this week
karpathy / autoresearch
View on GitHub
AI agents running research on single-GPU nanochat training automatically
☆91,965Mar 26, 2026Updated 3 months ago
Alibaba-NLP / DeepResearch
View on GitHub
Tongyi Deep Research, the Leading Open-source Deep Research Agent
☆19,714Feb 27, 2026Updated 4 months ago
xirui-li / ClawEnvKit
View on GitHub
Open-source Environment toolkit of claw-like agents, support task/harness generation and evaluation
☆58May 7, 2026Updated 2 months ago
earendil-works / pi
View on GitHub
AI agent toolkit: unified LLM API, agent loop, TUI, coding agent CLI
☆77,052Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Yummytanmo / SubtleMemory
View on GitHub
A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents
☆20Jun 9, 2026Updated last month
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆87,069Updated this week
jackwener / OpenCLI
View on GitHub
Make Any Website into CLI & Use your logged-in browser by AI agent.
☆27,173Updated this week
sierra-research / tau-bench
View on GitHub
Code and Data for Tau-Bench
☆1,344Mar 18, 2026Updated 4 months ago
opensandbox-group / OpenSandbox
View on GitHub
Secure, Fast, and Extensible Sandbox runtime for AI agents.
☆12,158Updated this week
QwenLM / Qwen-Agent
View on GitHub
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
☆16,841Mar 4, 2026Updated 4 months ago
MemTensor / MemOS
View on GitHub
Self-evolving memory OS for LLM & AI Agents: ultra-persistent memory, hybrid-retrieval, and cross-task skill reuse, with 35.24% token sav…
☆10,362Updated this week