PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with π¦ by the humans at https://kilo.ai
β1,255Jun 2, 2026Updated last month
Alternatives and similar repositories for skill
Users that are interested in skill are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β21Sep 5, 2023Updated 2 years ago
- Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.β684May 17, 2026Updated last month
- CCS 2023 | Explainable malware and vulnerability detection with XAI in paper "FINER: Enhancing State-of-the-art Classifiers with Feature β¦β12Aug 20, 2024Updated last year
- An in-the-wild benchmark for AI agents in the OpenClaw Environment.β453Jun 25, 2026Updated last week
- β12Jul 23, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.β38Sep 9, 2024Updated last year
- R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learningβ80May 25, 2025Updated last year
- β13Jan 22, 2025Updated last year
- A prompt collection for testing and evaluation of LLMs.β26Jun 26, 2026Updated last week
- Semantic memory for AI agents. Local-first, MCP-native.β35Mar 30, 2026Updated 3 months ago
- DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discoveryβ21Sep 24, 2025Updated 9 months ago
- Evaluation harness for OpenHands V1.β94Updated this week
- Source code for the paper "Prefix Language Models are Unified Modal Learners"β45Apr 30, 2023Updated 3 years ago
- β32May 27, 2025Updated last year
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A pre-trained model with multi-exit transformer architecture.β56Dec 10, 2022Updated 3 years ago
- Connect6 (Korean: μ‘λͺ©) for Python.β11May 15, 2017Updated 9 years ago
- daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficientlyβ38Feb 4, 2026Updated 5 months ago
- Rust library for conversion to/from proquintsβ14Feb 9, 2018Updated 8 years ago
- Meet Rustacean GPT, an experimental project transforming OpenAi's GPT into a helpful, autonomous software engineer to support senior deveβ¦β14May 10, 2023Updated 3 years ago
- β32May 21, 2026Updated last month
- The 4th rank system of the SemEval 2021 Task4.β11May 7, 2022Updated 4 years ago
- [AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Modelsβ23Aug 1, 2025Updated 11 months ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokensβ19Feb 29, 2024Updated 2 years ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Template repository with AI agent guardrails, safety protocols, and sprint task framework. For Claude, GPT, Gemini, and all LLMs.β70Jun 16, 2026Updated 2 weeks ago
- Claude Code powered content engine for research, ideation, and drafting. Based on Dan Koe's content framework.β58Jan 24, 2026Updated 5 months ago
- Run GEPA on your favorite non-python libraries.β36Jan 22, 2026Updated 5 months ago
- Synthesizing Fingerprint from Pattern Type Analysis Features using cGAN - WITC 2019β12Apr 19, 2019Updated 7 years ago
- {DeepL, Google, WMT-Best, davinci-003, turbo, gpt-4} Γ {En-De, En-Cs, En-Ru, En-Zh, De-Fr, En-Ja, Uk-En, Uk-Cs, En-Hr, En-Ha, En-Is}β14Jun 18, 2023Updated 3 years ago
- Rex Ying's Ph.D. Thesis, Stanford Universityβ42Jun 16, 2022Updated 4 years ago
- A simple ReAct agent that has access to LlamaIndex docs and to the internet to provide you with insights on LlamaIndex itself.β11Feb 23, 2025Updated last year
- Evals for agentsβ15Dec 4, 2024Updated last year
- The most intelligent context optimization engine for coding agents. Code-aware AST parsing across 17 languages. Command rewriting. Test, β¦β28Updated this week
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Secure your selected route by using a middleware with static password for developers onlyβ11Jun 22, 2026Updated last week
- A package for fine tuning of pretrained NLP transformers using Semi Supervised Learningβ14Oct 27, 2021Updated 4 years ago
- The OlymMATH datasetβ25Jun 1, 2025Updated last year
- Code for Horizontal Federated Learning blog around Credit Scoringβ10Sep 14, 2020Updated 5 years ago
- SDLC enforcement for Claude Code β hooks, skills, and wizard setup in one command. TDD, planning, self-review, CI shepherd.β31Updated this week
- Real-time monitoring and visualization for OpenCode agents.β51Jan 31, 2026Updated 5 months ago
- Code and Data for the ACL21 paper "Modeling Bilingual Conversational Characteristics for Neural Chat Translation"β12Dec 17, 2021Updated 4 years ago