patronus-ai/trail-benchmark

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/patronus-ai/trail-benchmark)

patronus-ai / trail-benchmark

☆21

Alternatives and similar repositories for trail-benchmark

Users that are interested in trail-benchmark are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TraceElephant / TraceElephant
View on GitHub
Repo of "Seeing the Whole Elephant: A Benchmark for Failure Attribution in LLM-based Multi-Agent Systems" (ACL 2026)
☆16Apr 27, 2026Updated 2 months ago
microsoft / ACV
View on GitHub
A series of work towards achieving ACV.
☆38Apr 20, 2026Updated 3 months ago
ulab-uiuc / AgentDebug
View on GitHub
☆97Mar 30, 2026Updated 3 months ago
ag2ai / Agents_Failure_Attribution
View on GitHub
Benchmark for automated failure attributions in agentic systems (🏆 ICML 2025 Spotlight)
☆381Feb 11, 2026Updated 5 months ago
dessertlab / Fault-Injection-Dataset
View on GitHub
Failure dataset accompanying the paper "How Bad Can a Bug Get? An Empirical Analysis of Software Failures in the OpenStack Cloud Computi…
☆10Jun 12, 2020Updated 6 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
dsrg-uoft / LangBench
View on GitHub
LangBench applications and scripts
☆14Jun 7, 2023Updated 3 years ago
kfq20 / AEGIS
View on GitHub
AEGIS: Automated Error Generation and Attribution for Multi-Agent Systems
☆25Feb 28, 2026Updated 4 months ago
wzq016 / PINE
View on GitHub
Offcial Repo of Paper "Eliminating Position Bias of Language Models: A Mechanistic Approach""
☆23Jun 13, 2025Updated last year
bertiev / SimpleSafetyTests
View on GitHub
☆19Mar 25, 2024Updated 2 years ago
yubol-bobo / MT-Consistency
View on GitHub
This repo investigates LLMs' tendency to exhibit acquiescence bias in sequential QA interactions. Includes evaluation methods, datasets, …
☆17Apr 24, 2026Updated 3 months ago
multi-agent-systems-failure-taxonomy / MAST
View on GitHub
☆395Jul 23, 2025Updated last year
dykang / cgraph
View on GitHub
dataset for Detecting and Explaining Causes From Text For a Time Series Event, EMNLP'17
☆15Aug 31, 2020Updated 5 years ago
aengusl / spawrious
View on GitHub
☆32Mar 1, 2024Updated 2 years ago
kkkevinkkkkk / situated_faithfulness
View on GitHub
☆14Oct 17, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ethan-w-roland / AUNN
View on GitHub
Simple implementation of Gwern's AUNN proposal
☆15Oct 5, 2025Updated 9 months ago
BaizeAI / kcover
View on GitHub
🧯 Kubernetes coverage for fault awareness and recovery, works for any LLMOps, MLOps, AI workloads.
☆35Updated this week
IBM / LLM-performance-prediction
View on GitHub
Predict the performance of LLM inference services
☆23Sep 18, 2025Updated 10 months ago
bespokelabsai / awesome-rl
View on GitHub
☆18Apr 11, 2025Updated last year
zjukg / CCKS2024_CGQA
View on GitHub
☆11May 17, 2024Updated 2 years ago
TREMA-UNH / rubric-grading-workbench
View on GitHub
A Workbench for Autograding Retrieve/Generate Systems
☆15Jun 30, 2025Updated last year
tuhinjubcse / SimileGeneration-EMNLP2020
View on GitHub
Code for SCOPE (Style transfer through COmmonsense PropErty) , a style transfer approach to convert literal sentences to similes
☆19Apr 18, 2021Updated 5 years ago
rvenet / RVENet
View on GitHub
Source code related to the research paper entitled RVENet: A Large Echocardiographic Dataset for the Deep Learning-Based Assessment of Ri…
☆12Mar 10, 2024Updated 2 years ago
kwaipilot / SWE-Compass
View on GitHub
☆18Mar 28, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
kiyan-rezaee / Systematic-Literature-Review-on-Online-Continual-Learning
View on GitHub
☆14Jan 10, 2025Updated last year
ddhruvkr / CONTRADOC
View on GitHub
☆13Feb 8, 2025Updated last year
boostcampaitech2 / final-project-level3-nlp-02
View on GitHub
final-project-level3-nlp-02 created by GitHub Classroom
☆11Dec 31, 2021Updated 4 years ago
scale-snu / layered-prefill
View on GitHub
Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall fre…
☆18Mar 9, 2026Updated 4 months ago
MINE-USTC / Xiangqi-R1
View on GitHub
Code for the paper Xiangqi-R1: Enhancing Spatial Strategic Reasoning in LLMs for Chinese Chess via Reinforcement Learning
☆15Jul 23, 2025Updated last year
caiqizh / LUQ
View on GitHub
☆14Jan 14, 2026Updated 6 months ago
JL-Cheng / SERE
View on GitHub
[ICLR 2026] SERE: Similarity-Based Expert Re-routing for Efficient Batch Decoding in MoE Models
☆18Feb 4, 2026Updated 5 months ago
AaltoPML / human-in-the-loop-predictive-maintenance
View on GitHub
☆10Jun 4, 2024Updated 2 years ago
pjzj220113 / chinese-sarcasm-calculation
View on GitHub
欢迎参加中文讽刺计算评测任务！
☆14Nov 4, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
keanudicap / MSQA
View on GitHub
Microsoft question-answering dataset
☆10Jun 16, 2023Updated 3 years ago
coinse / autofl
View on GitHub
☆33Jan 14, 2025Updated last year
NYCU-EDgeAi / subspec
View on GitHub
[NeurIPS 2025] Speculate Deep and Accurate
☆22Jan 16, 2026Updated 6 months ago
xqlin98 / Fair-yet-Equal-CML
View on GitHub
This is the official implementation of the ICML 2023 paper "Fair yet Asymptotically Equal Collaborative Learning"
☆10May 29, 2023Updated 3 years ago
ml-lab-htw / llm-trees
View on GitHub
Official repo: “Oh LLM, I’m Asking Thee, Please Give Me a Decision Tree”: Zero-Shot Decision Tree Induction and Embedding with Large Lang…
☆16Jul 17, 2026Updated last week
styfeng / GenAug
View on GitHub
Code for GenAug: Data Augmentation for Finetuning Text Generators.
☆28Oct 8, 2021Updated 4 years ago
llylly / RANUM
View on GitHub
[ICSE 2023] Differentiable interpretation and failure-inducing input generation for neural network numerical bugs.
☆13Jan 5, 2024Updated 2 years ago