Small, simple agent task environments for training and evaluation
☆19Nov 1, 2024Updated last year
Alternatives and similar repositories for SmallBench
Users that are interested in SmallBench are comparing it to the libraries listed below
Sorting:
- vLLM with support for span semantics☆21Jan 26, 2026Updated last month
- Repository for tw.org site☆14Feb 11, 2026Updated 2 weeks ago
- Evals meant to evaluate language models' ability to reason over long contexts.☆10Sep 12, 2024Updated last year
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆98Oct 3, 2025Updated 4 months ago
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆45Feb 15, 2024Updated 2 years ago
- Code for Columbia University COMS 3997 – LLM Ethics and Foundations☆14Jan 7, 2025Updated last year
- An attribution library for LLMs☆46Sep 17, 2024Updated last year
- ☆52Sep 18, 2024Updated last year
- ☆31Jan 18, 2025Updated last year
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆53Jul 9, 2025Updated 7 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆151Jan 20, 2025Updated last year
- Makes it easy to use altair from FastHTML☆28Oct 9, 2024Updated last year
- ☆31Nov 10, 2024Updated last year
- ☆66Sep 13, 2025Updated 5 months ago
- 🤖 Headless IDE for AI agents☆204Oct 9, 2025Updated 4 months ago
- List of papers on Self-Correction of LLMs.☆80Dec 28, 2024Updated last year
- This repository contains source code and a high-quality test dataset for "Automated Commit Message Generation with Large Language Models.…☆10Nov 6, 2025Updated 3 months ago
- ☆13Nov 5, 2024Updated last year
- CrewAI-Agentic-Jira: Enhance your Jira workflows with intelligent agent-driven automation. Powered by the CrewAI framework, this project …☆21Feb 3, 2025Updated last year
- A Claude Code plugin that solves the same problems as community frameworks (GSD, BMAD, Ralph, Agent OS) — but using the tool's native arc…☆26Updated this week
- ☆40May 14, 2025Updated 9 months ago
- End-to-end Generative Optimization for AI Agents☆708Dec 10, 2025Updated 2 months ago
- Collection of papers for scalable automated alignment.☆93Oct 22, 2024Updated last year
- Collection of specialized agent definitions for Claude Code☆32Feb 2, 2026Updated 3 weeks ago
- MLX Implementation of Recursive Reasoning with Tiny Networks☆78Oct 11, 2025Updated 4 months ago
- A minimal tool to generate and validate datasets.☆26Feb 19, 2026Updated last week
- The JOS from MIT open course☆11Dec 21, 2011Updated 14 years ago
- Community Eventing and Scripting examples☆18Aug 11, 2025Updated 6 months ago
- CodeSnippet WebSite☆10Oct 24, 2017Updated 8 years ago
- Provides for deploying custom ETL containers on AIStore, with subsequent user-defined extraction-transformation-loading in parallel, on t…☆19Nov 26, 2025Updated 3 months ago
- Rasa Ephermal Installer☆13Jun 9, 2022Updated 3 years ago
- A blog site that allow to every person create a blog and share it with only his link.☆13Dec 8, 2025Updated 2 months ago
- Replication package of the ICSE2025 paper titled "Leveraging Large Language Models for Enhancing the Understandability of Generated Unit …☆11Feb 19, 2025Updated last year
- FamilyTool benchmark☆12Sep 10, 2025Updated 5 months ago
- ☆13May 22, 2024Updated last year
- In ancient Egypt the pelican was believed to possess the ability to prophesy safe passage in the underworld. Pelicans are ferocious eater…☆11Apr 7, 2023Updated 2 years ago
- The Dolby.io Communications C++ SDK provides both Client and Server applications the ability to create HD voice and video for fully immer…☆13Aug 30, 2024Updated last year
- Tool for generating summary stat reports and graphs from Gerrit (https://www.gerritcodereview.com/) and GitHub Enterprise review and pull…☆12May 14, 2019Updated 6 years ago
- ☆10Sep 25, 2019Updated 6 years ago