Small, simple agent task environments for training and evaluation
☆19Nov 1, 2024Updated last year
Alternatives and similar repositories for SmallBench
Users that are interested in SmallBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆45Feb 15, 2024Updated 2 years ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆99Oct 3, 2025Updated 6 months ago
- Repository for tw.org site☆14Mar 30, 2026Updated last week
- vLLM with support for span semantics☆22Feb 27, 2026Updated last month
- Jason Meridth's blog☆13Apr 2, 2026Updated last week
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Evals meant to evaluate language models' ability to reason over long contexts.☆10Sep 12, 2024Updated last year
- Official PyTorch Implementation for Meaning Representations from Trajectories in Autoregressive Models (ICLR 2024)☆22May 14, 2024Updated last year
- Solving data for LLMs - Create quality synthetic datasets!☆151Jan 20, 2025Updated last year
- Code for Columbia University COMS 3997 – LLM Ethics and Foundations☆14Jan 7, 2025Updated last year
- ☆28Sep 23, 2025Updated 6 months ago
- A Python library to orchestrate LLMs in a neural network-inspired structure☆53Oct 4, 2024Updated last year
- Community Eventing and Scripting examples☆19Aug 11, 2025Updated 7 months ago
- An attribution library for LLMs☆46Sep 17, 2024Updated last year
- Code of CropMix: Sampling a Rich Input Distribution via Multi-Scale Cropping☆17Oct 8, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Makes it easy to use altair from FastHTML☆28Oct 9, 2024Updated last year
- ☆53Sep 18, 2024Updated last year
- ☆66Sep 13, 2025Updated 6 months ago
- Automated hyperparameter search for optimal Gabliteration configurations on large language models☆46Mar 10, 2026Updated 3 weeks ago
- LobotoMl is a set of scripts and tools to assess production deployments of ML services☆10May 16, 2022Updated 3 years ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆29Dec 10, 2024Updated last year
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆63Apr 10, 2024Updated last year
- Replication package for evaluation of code generation metrics☆17Nov 24, 2025Updated 4 months ago
- An HTTP proxy that naively injects NTLM data for the current user into outgoing requests☆14Nov 14, 2018Updated 7 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆31Jan 18, 2025Updated last year
- Python client for the Open eXecution Protocol (OXP)☆17May 16, 2025Updated 10 months ago
- Workers + Stytch TODO App MCP Server☆27Mar 12, 2026Updated 3 weeks ago
- End-to-end Generative Optimization for AI Agents☆714Dec 10, 2025Updated 3 months ago
- A Data Source for Reasoning Embodied Agents☆19Sep 18, 2023Updated 2 years ago
- Link Python logic with Svelte interfaces for simple demos☆13Jan 9, 2025Updated last year
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆53Jul 9, 2025Updated 9 months ago
- ☆18May 6, 2023Updated 2 years ago
- TRITONCACHE implementation of a Redis cache☆17Mar 13, 2026Updated 3 weeks ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- A jupyter client for your terminal☆24Jan 3, 2026Updated 3 months ago
- ☆28Oct 22, 2024Updated last year
- Arxiv + Notion Sync☆20May 12, 2025Updated 10 months ago
- ☆13Jul 14, 2024Updated last year
- BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions☆25Aug 8, 2024Updated last year
- KMD is a collection of conversational exchanges between patients and doctors on various medical topics. It aims to capture the intricaci…☆24Nov 15, 2023Updated 2 years ago
- DSPy: The framework for programming with foundation models☆13Aug 24, 2023Updated 2 years ago