Small, simple agent task environments for training and evaluation
☆19Nov 1, 2024Updated last year
Alternatives and similar repositories for SmallBench
Users that are interested in SmallBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆45Feb 15, 2024Updated 2 years ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆102Oct 3, 2025Updated 8 months ago
- Jason Meridth's blog☆14May 28, 2026Updated last week
- Evals meant to evaluate language models' ability to reason over long contexts.☆10Sep 12, 2024Updated last year
- SWE Arena☆36Jul 6, 2025Updated 11 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Solving data for LLMs - Create quality synthetic datasets!☆151Jan 20, 2025Updated last year
- ☆29Sep 23, 2025Updated 8 months ago
- A Python library to orchestrate LLMs in a neural network-inspired structure☆53Oct 4, 2024Updated last year
- An attribution library for LLMs☆46Sep 17, 2024Updated last year
- Code of CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping☆17Oct 8, 2022Updated 3 years ago
- ☆54Sep 18, 2024Updated last year
- Explore and Control with Adversarial Surprise☆10Jul 20, 2021Updated 4 years ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆30Dec 10, 2024Updated last year
- A Kubernetes operator for managing Prefect servers and work pools☆17Updated this week
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆15Sep 30, 2022Updated 3 years ago
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆63Apr 10, 2024Updated 2 years ago
- Help protect against malicious build scripts☆29May 31, 2026Updated last week
- Understanding deep networks and large models.☆29Jan 23, 2026Updated 4 months ago
- ☆31Jan 18, 2025Updated last year
- Python client for the Open eXecution Protocol (OXP)☆17May 16, 2025Updated last year
- End-to-end Generative Optimization for AI Agents☆739Dec 10, 2025Updated 5 months ago
- Link Python logic with Svelte interfaces for simple demos☆14Jan 9, 2025Updated last year
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆54Jul 9, 2025Updated 11 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆136May 26, 2026Updated 2 weeks ago
- ☆29Oct 22, 2024Updated last year
- ☆40May 14, 2025Updated last year
- Arxiv + Notion Sync☆20May 12, 2025Updated last year
- ☆13Jul 14, 2024Updated last year
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions☆25Aug 8, 2024Updated last year
- KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…☆24Nov 15, 2023Updated 2 years ago
- [NAACL 2024] CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions☆13May 7, 2024Updated 2 years ago
- Minimal (truly) muP implementation, consistent with TP4 and TP5 papers notation☆14Jan 2, 2026Updated 5 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆106Mar 6, 2025Updated last year
- A framework for creating message-driven training systems with PyTorch☆21Oct 7, 2025Updated 8 months ago
- ☆10Nov 10, 2021Updated 4 years ago
- Adaptation of gxemul to support the CHERI MIPS unit test suite and certain CHERI features☆16Dec 8, 2015Updated 10 years ago
- 🤖 Headless IDE for AI agents☆204Oct 9, 2025Updated 8 months ago
- MLX Implementation of Recursive Reasoning with Tiny Networks☆78Oct 11, 2025Updated 7 months ago
- GenericInjector for win32 programs☆12Jun 19, 2017Updated 8 years ago