JoshuaPurtell/SmallBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JoshuaPurtell/SmallBench)

JoshuaPurtell / SmallBench

Small, simple agent task environments for training and evaluation

☆20

Alternatives and similar repositories for SmallBench

Users that are interested in SmallBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

neoxelox / dspy-inspector
View on GitHub
DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.
☆45Feb 15, 2024Updated 2 years ago
GothenburgBitFactory / tw.org
View on GitHub
Repository for tw.org site
☆14Updated this week
zbambergerNLP / strategic-debate-tot
View on GitHub
A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments
☆103Oct 3, 2025Updated 9 months ago
furlat / Abstractions
View on GitHub
A Collection of Pydantic Models to Abstract IRL
☆41Dec 10, 2025Updated 7 months ago
JoshuaPurtell / LRCBench
View on GitHub
Evals meant to evaluate language models' ability to reason over long contexts.
☆10Sep 12, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
tianyu139 / meaning-as-trajectories
View on GitHub
Official PyTorch Implementation for Meaning Representations from Trajectories in Autoregressive Models (ICLR 2024)
☆22May 14, 2024Updated 2 years ago
IBM / vllm
View on GitHub
vLLM with support for span semantics
☆25Feb 27, 2026Updated 4 months ago
mustafamariam / LLM-Connections-Solver
View on GitHub
Code for Columbia University COMS 3997 – LLM Ethics and Foundations
☆16Jan 7, 2025Updated last year
BhabhaAI / dataformer
View on GitHub
Solving data for LLMs - Create quality synthetic datasets!
☆152Jan 20, 2025Updated last year
leap-laboratories / PIZZA
View on GitHub
An attribution library for LLMs
☆46Sep 17, 2024Updated last year
koaning / fh-altair
View on GitHub
Makes it easy to use altair from FastHTML
☆28Oct 9, 2024Updated last year
raga-ai-hub / ragaai-catalyst-v1
View on GitHub
☆31Jan 18, 2025Updated last year
autogenai / easy-problems-that-llms-get-wrong
View on GitHub
☆53Sep 18, 2024Updated last year
NickNameInvalid / LLM_CTF
View on GitHub
☆66Sep 13, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
alkaet / LobotoMl
View on GitHub
LobotoMl is a set of scripts and tools to assess production deployments of ML services
☆10May 16, 2022Updated 4 years ago
ArnaudFickinger / adversarial-surprise
View on GitHub
Explore and Control with Adversarial Surprise
☆10Jul 20, 2021Updated 4 years ago
bigcode-project / astraios
View on GitHub
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆63Apr 10, 2024Updated 2 years ago
sonibla / pytorch_keras_converter
View on GitHub
☆15Sep 30, 2022Updated 3 years ago
Tebmer / Rereading-LLM-Reasoning
View on GitHub
EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…
☆30Dec 10, 2024Updated last year
antiguru / flatcontainer
View on GitHub
A flat container abstraction for Rust
☆17Nov 24, 2025Updated 7 months ago
devvrit / ScaleRL-Curve-Fitting
View on GitHub
ScaleRL Curve Fitting
☆17Oct 13, 2025Updated 9 months ago
dspy-community / dspy-session
View on GitHub
☆27Feb 26, 2026Updated 4 months ago
bradAGI / DSPy-Stock-News-Sentiment-Analyzer
View on GitHub
☆23Oct 22, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sutro-sh / sutro
View on GitHub
Analyze and generate unstructured data using LLMs, from quick experiments to billion token jobs.
☆17Jun 19, 2026Updated last month
OpenExecProtocol / oxp-python
View on GitHub
Python client for the Open eXecution Protocol (OXP)
☆17May 16, 2025Updated last year
mike-rogers / NtlmProxy
View on GitHub
An HTTP proxy that naively injects NTLM data for the current user into outgoing requests
☆14Nov 14, 2018Updated 7 years ago
Archelunch / vibe-dspy
View on GitHub
☆55Aug 22, 2025Updated 10 months ago
ajskateboarder / spylt
View on GitHub
Link Python logic with Svelte interfaces for simple demos
☆14Jan 9, 2025Updated last year
agential-ai / agential
View on GitHub
🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!
☆54Jul 9, 2025Updated last year
microsoft / Trace
View on GitHub
End-to-end Generative Optimization for AI Agents
☆748Jun 17, 2026Updated last month
triton-inference-server / redis_cache
View on GitHub
TRITONCACHE implementation of a Redis cache
☆17Updated this week
remichu-ai / gallama
View on GitHub
☆137Jun 30, 2026Updated 2 weeks ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
entropy-flux / TorchSystem
View on GitHub
A framework for creating message-driven training systems with PyTorch
☆21Oct 7, 2025Updated 9 months ago
knowrohit / know_medical_dialogues
View on GitHub
KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…
☆24Nov 15, 2023Updated 2 years ago
facebookresearch / NeuralMemory
View on GitHub
A Data Source for Reasoning Embodied Agents
☆20Sep 18, 2023Updated 2 years ago
hwchase17 / dspy
View on GitHub
DSPy: The framework for programming with foundation models
☆13Aug 24, 2023Updated 2 years ago
X-LANCE / text2sql-multiturn-GPT
View on GitHub
[NAACL 2024] CoE-SQL: In-Context Learning for Multi-Turn Text-to-SQL with Chain-of-Editions
☆13May 7, 2024Updated 2 years ago
GAIR-NLP / OlympicArena
View on GitHub
[NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
☆106Mar 6, 2025Updated last year
haizelabs / bijection-learning
View on GitHub
☆29Oct 22, 2024Updated last year