kixlab / EvalLMLinks
Interactive environment for evaluating LLM prompts on natural language criteria.
☆24Updated 6 months ago
Alternatives and similar repositories for EvalLM
Users that are interested in EvalLM are comparing it to the libraries listed below
Sorting:
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆82Updated 4 months ago
- A toolkit for building computer use AI agents☆169Updated 3 weeks ago
- Tutorial for building LLM router☆219Updated last year
- ☆78Updated 8 months ago
- Kura is a simple reproduction of the CLIO paper which uses language models to label user behaviour before clustering them based on embedd…☆252Updated 3 weeks ago
- ☆148Updated 2 weeks ago
- A list of AI memory projects☆176Updated 6 months ago
- Synthetic Data for LLM Fine-Tuning☆119Updated last year
- ☆154Updated last week
- ☆195Updated last year
- Turn a Github Repo's contents into a big prompt for long-context models like Claude 3 Opus.☆217Updated 5 months ago
- Task-based Agentic Framework using StrictJSON as the core☆454Updated last week
- An agent benchmark with tasks in a simulated software company.☆494Updated 2 weeks ago
- A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments☆85Updated 9 months ago
- An Awesome list of curated DSPy resources.☆382Updated 5 months ago
- A framework for generative software.☆113Updated 2 weeks ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆448Updated 9 months ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆254Updated 9 months ago
- Scrapybara Python SDK☆70Updated last month
- An automated tool for discovering insights from research papaer corpora☆138Updated last year
- A library of reasoning algorithms for agents☆257Updated last month
- Inference-time scaling for LLMs-as-a-judge.☆258Updated last week
- Research repository on interfacing LLMs with Weaviate APIs. Inspired by the Berkeley Gorilla LLM.☆132Updated last month
- 📝 Automatically annotate papers using LLMs☆331Updated 3 months ago
- Python SDK for running evaluations on LLM generated responses☆289Updated last month
- Together Open Deep Research☆321Updated 3 months ago
- A simple Python sandbox for helpful LLM data agents☆275Updated last year
- A system that tries to resolve all issues on a github repo with OpenHands.☆110Updated 8 months ago
- Solving data for LLMs - Create quality synthetic datasets!☆150Updated 6 months ago
- Your automated SWE fleet to get your tickets from the Backlog to Prod!☆98Updated last year