ServiceNow / agent-poirotLinks
☆15Updated 5 months ago
Alternatives and similar repositories for agent-poirot
Users that are interested in agent-poirot are comparing it to the libraries listed below
Sorting:
- Code for Language-Interfaced FineTuning for Non-Language Machine Learning Tasks.☆130Updated 11 months ago
- ☆77Updated last year
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?☆213Updated last week
- A small library of LLM judges☆294Updated 2 months ago
- ☆53Updated last week
- Discovering Data-driven Hypotheses in the Wild☆113Updated 4 months ago
- State-of-the-art paired encoder and decoder models (17M-1B params)☆50Updated 2 months ago
- PyTorch library for Active Fine-Tuning☆93Updated 3 weeks ago
- ☆48Updated last year
- Research on Tabular Foundation Models☆58Updated 10 months ago
- Benchmarking Large Language Models☆99Updated 3 months ago
- Flexible library for merging large language models (LLMs) via evolutionary optimization (ACL 2025 Demo).☆88Updated 2 months ago
- Efficient multi-prompt evaluation of LLMs☆22Updated 10 months ago
- ☆109Updated 8 months ago
- Efficiently find the best-suited language model (LM) for your NLP task☆127Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- EvalAssist is an open-source project that simplifies using large language models as evaluators (LLM-as-a-Judge) of the output of other la…☆89Updated last week
- LLM Attributor: Attribute LLM's Generated Text to Training Data☆63Updated last month
- This is the official repository for HypoGeniC (Hypothesis Generation in Context) and HypoRefine, which are automated, data-driven tools t…☆89Updated 3 weeks ago
- LangFair is a Python library for conducting use-case level LLM bias and fairness assessments☆236Updated last week
- ☆36Updated 2 years ago
- Evaluating LLMs with fewer examples☆163Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆136Updated 3 months ago
- ☆80Updated this week
- PyTorch implementation for MRL☆19Updated last year
- ☆55Updated 2 years ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 11 months ago
- Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.☆42Updated 7 months ago
- A mechanistic approach for understanding and detecting factual errors of large language models.☆46Updated last year
- Evaluation of neuro-symbolic engines☆39Updated last year