The agent benchmark that scores the full stack — harness, config, and model — not just the LLM. Trace-based scoring, reliability metrics, configuration diagnostics.
☆106Jun 2, 2026Updated last week
Alternatives and similar repositories for clawbench
Users that are interested in clawbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14May 12, 2025Updated last year
- ☆36Nov 26, 2025Updated 6 months ago
- ☆34Feb 17, 2026Updated 3 months ago
- This repository presents an evaluation framework for speech-to-speech (S2S) models, following the methodology described in the EmphAsses …☆25Jan 9, 2024Updated 2 years ago
- ☆36May 30, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 软微新圣经----大兴究竟有什么可以输?☆13Sep 18, 2022Updated 3 years ago
- Kubernetes operator for local LLM inference with llama.cpp, vLLM, TGI, and mlx-server — multi-GPU NVIDIA + Apple Silicon Metal, autoscali…☆118Jun 2, 2026Updated last week
- Official implementation of our CVPR'22 paper.☆13Nov 18, 2022Updated 3 years ago
- ☆28Jun 12, 2025Updated 11 months ago
- [EMNLP 2024] FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents☆22Jan 6, 2025Updated last year
- Code for paper "Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication"☆23Mar 30, 2024Updated 2 years ago
- ☆10May 25, 2023Updated 3 years ago
- A Solidity spec suite to test parsers for language compliance.☆11Dec 31, 2017Updated 8 years ago
- Code for "Robust Pose Estimation in Crowded Scenes with Direct Pose-Level Inference", NeurIPS 2021☆15Dec 2, 2021Updated 4 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆218Dec 19, 2025Updated 5 months ago
- Discovering human interaction with novel objects via zero-shot learning, CVPR, 2020☆42Jul 14, 2020Updated 5 years ago
- All You Need to Know About Image Retrieval: a repo to automagically download datasets and run experiments☆65Mar 18, 2025Updated last year
- ☆21Mar 5, 2025Updated last year
- ☆57May 28, 2024Updated 2 years ago
- fuzzy matching with Levenshtein, Damerau-Levenshtein, Bitap and n-gram☆23Jul 31, 2025Updated 10 months ago
- ✏ Solidity support for VSCode☆10Jan 11, 2023Updated 3 years ago
- FlashKDA: high-performance Kimi Delta Attention kernels☆447May 26, 2026Updated 2 weeks ago
- ☆16Aug 5, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Multitask NLU architecture for text and token classification tasks.☆14Jan 7, 2023Updated 3 years ago
- A library for sending software performance metrics from Python libraries and apps to statsd.☆31May 19, 2026Updated 3 weeks ago
- Toolkit for allowing inference and serving with MXNet in SageMaker. Dockerfiles used for building SageMaker MXNet Containers are at https…☆29Sep 13, 2023Updated 2 years ago
- OWASP Zed Attack Proxy plugin for py.test☆13Sep 10, 2015Updated 10 years ago
- Linux Security Module Stacking☆10Apr 25, 2026Updated last month
- Aims for memory-efficient training (24GB VRAM) on consumer GPUs. Optimizing language models through guidance tokens in reasoning chains, …☆28Feb 23, 2025Updated last year
- ☆13Apr 14, 2025Updated last year
- WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆36Jun 10, 2025Updated 11 months ago
- A tiny search engine.