PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai
☆1,057Apr 25, 2026Updated this week
Alternatives and similar repositories for skill
Users that are interested in skill are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Sep 5, 2023Updated 2 years ago
- Claw-Eval is an evaluation harness for evaluating LLM as agents. All tasks verified by humans.☆510Updated this week
- An in-the-wild benchmark for AI agents in the OpenClaw Environment.☆318Apr 21, 2026Updated last week
- CCS 2023 | Explainable malware and vulnerability detection with XAI in paper "FINER: Enhancing State-of-the-art Classifiers with Feature …☆11Aug 20, 2024Updated last year
- AI-powered code reviews action.☆15Jun 5, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 爬取各大OJ题目☆10Aug 28, 2017Updated 8 years ago
- ☆56Mar 13, 2026Updated last month
- [ACM MM 2025] MLLMs for Aesthetics Reasoning☆25Jan 5, 2026Updated 3 months ago
- R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning☆77May 25, 2025Updated 11 months ago
- Run LLMs on Apple devices with CoreML, optimized for Apple Neural Engine + GPU☆100Updated this week
- ☆13Jan 22, 2025Updated last year
- A prompt collection for testing and evaluation of LLMs.☆27Feb 24, 2026Updated 2 months ago
- DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery☆20Sep 24, 2025Updated 7 months ago
- Evaluation harness for OpenHands V1.☆75Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official repository Flash Local Linear Attention☆23Apr 23, 2026Updated last week
- ☆33May 27, 2025Updated 11 months ago
- Official code repository for the paper: AbsPyramid: Benchmarking the Abstration Ability of Language Models with a Unified Entailment Grap…☆13Oct 30, 2024Updated last year
- daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently☆38Feb 4, 2026Updated 2 months ago
- AgentRE-Bench is an agentic benchmark that evaluates state-of-the-art models on long-horizon reverse engineering tasks, measuring their a…☆52Updated this week
- ☆11Oct 29, 2024Updated last year
- Create visual graphs out of your Notion workspace☆11Aug 21, 2022Updated 3 years ago
- [AAAI'25] CharacterBench: Benchmarking Character Customization of Large Language Models☆22Aug 1, 2025Updated 9 months ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆19Feb 29, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Deep Learning to Predict MLB Outcomes☆10Aug 23, 2023Updated 2 years ago
- ☆11Mar 1, 2024Updated 2 years ago
- Synthesizing Fingerprint from Pattern Type Analysis Features using cGAN - WITC 2019☆12Apr 19, 2019Updated 7 years ago
- ☆115Sep 13, 2025Updated 7 months ago
- [ICML 2021] This is the official github repo for training L_inf dist nets with high certified accuracy.☆42Mar 16, 2022Updated 4 years ago
- ☆10Aug 19, 2023Updated 2 years ago
- A simple script for extracting plain text from arxiv dataset: https://www.kaggle.com/Cornell-University/arxiv☆15Dec 7, 2020Updated 5 years ago
- ☆27Apr 14, 2025Updated last year
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Aug 5, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Fine tuning Mistral-7b with PEFT(Parameter Efficient Fine-Tuning) and LoRA(Low-Rank Adaptation) on Puffin Dataset(multi-turn conversation…☆12Nov 23, 2023Updated 2 years ago
- ☆13Aug 18, 2022Updated 3 years ago
- A reference containing Styles and Keywords that you can use with MidJourney AI. There are also pages showing resolution comparison, image…☆11Mar 23, 2023Updated 3 years ago
- The OlymMATH dataset☆24Jun 1, 2025Updated 11 months ago
- A package for fine tuning of pretrained NLP transformers using Semi Supervised Learning☆14Oct 27, 2021Updated 4 years ago
- code for Teaching LM to Translate with Comparison☆39Dec 15, 2023Updated 2 years ago
- The theory of mind module for the SWE agent☆100Jan 13, 2026Updated 3 months ago