XiangLi1999/AutoBencher

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/XiangLi1999/AutoBencher)

XiangLi1999 / AutoBencher

☆33

Alternatives and similar repositories for AutoBencher

Users that are interested in AutoBencher are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

davidheineman / thresh
View on GitHub
🌾 Universal, customizable and deployable fine-grained evaluation for text generation.
☆24Apr 22, 2026Updated 3 months ago
BunsenFeng / FactKB
View on GitHub
Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.
☆20Dec 25, 2023Updated 2 years ago
HowieHwong / DataGen
View on GitHub
[ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models
☆69Mar 8, 2025Updated last year
facebookresearch / lss_eval
View on GitHub
This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…
☆31Aug 25, 2023Updated 2 years ago
amazon-science / fact-graph
View on GitHub
Implementation of the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)"
☆52Jul 26, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
VidCapBench / VidCapBench
View on GitHub
☆13May 17, 2025Updated last year
ivanleomk / modal-grpo
View on GitHub
☆19Mar 16, 2025Updated last year
mcxiaoxiao / QDA-SQL
View on GitHub
[CICAI2026] Efficiently creating diverse multi-turn Text-to-SQL training samples in 3 steps! 🚀
☆15Jul 16, 2026Updated last week
LeonEricsson / llmjudge
View on GitHub
Exploring limitations of LLM-as-a-judge
☆20Aug 17, 2024Updated last year
tsor13 / kaleido
View on GitHub
☆24Mar 8, 2024Updated 2 years ago
xieyxclack / factual_coco
View on GitHub
The implementation of <Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation> in PyTorch.
☆17Nov 11, 2021Updated 4 years ago
nttmdlab-nlp / ToMATO
View on GitHub
ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind (AAAI2025)
☆20Apr 16, 2025Updated last year
tlringer / proof-chat-fun
View on GitHub
playing with gpt4
☆13Mar 17, 2023Updated 3 years ago
runchu-tian / LongPiBench
View on GitHub
The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"
☆14Dec 16, 2024Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
tingofurro / shuffle_test
View on GitHub
Codebase, data and models for the Re-Thinking the Shuffle Test paper at ACL2021
☆10Oct 14, 2022Updated 3 years ago
liziniu / GEM
View on GitHub
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆58May 12, 2025Updated last year
TrustGen / TrustEval-toolkit
View on GitHub
[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.
☆132Aug 22, 2025Updated 11 months ago
Scikud / AnythingButWrappers
View on GitHub
☆13May 7, 2023Updated 3 years ago
JungHoyoun / PromptCompressor
View on GitHub
☆12Apr 29, 2024Updated 2 years ago
lyh6560new / P3Sum
View on GitHub
The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"
☆10Jun 23, 2024Updated 2 years ago
mchiquier / llm-mutate
View on GitHub
☆15Oct 7, 2024Updated last year
jschuetzke / synthetic-spectra-benchmark
View on GitHub
Benchmarking of 1D pattern classification networks
☆11Jul 19, 2023Updated 3 years ago
Jaredk3nt / phoenix-padding
View on GitHub
Simple phoenix setup for padded window management
☆13Apr 25, 2018Updated 8 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
julianje / Bishop
View on GitHub
Mental state inference from observable behavior
☆15Dec 3, 2021Updated 4 years ago
datarobot-community / symbolic-regression-python
View on GitHub
Symbolic Regression from Scratch with Python
☆14Dec 6, 2022Updated 3 years ago
neulab / ToM-Language-Acquisition
View on GitHub
Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".
☆15Apr 27, 2023Updated 3 years ago
stellalisy / mediQ
View on GitHub
☆43Jan 26, 2025Updated last year
vidhishanair / FactEdit
View on GitHub
☆14Aug 30, 2023Updated 2 years ago
s-kumano / imagenet-superclass
View on GitHub
The example of correspondence between fine classes and superclasses (coarse classes) in ImageNet.
☆13Dec 4, 2024Updated last year
mcxiaoxiao / MMSQL
View on GitHub
[IJCNN2025] MMSQL: Multi-turn Multi-type text-to-SQL test suit. Repository contains scripts, code, datasets in the paper "Evaluating and …
☆21Jul 15, 2026Updated last week
LzyFischer / BrainMAP
View on GitHub
☆12Jan 20, 2025Updated last year
Nanne / ProtoSim
View on GitHub
Code and instructions accompanying ICCV'23 paper Protoype-based Dataset Comparison
☆18Dec 15, 2023Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
YichenZW / Robust-Det
View on GitHub
The code implementation of the paper Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks (A…
☆13Jul 16, 2024Updated 2 years ago
NewsStoriesData / newsstories.github.io
View on GitHub
☆22Sep 20, 2022Updated 3 years ago
real-absolute-AI / Unnatural_Language
View on GitHub
The official repository of 'Unnatural Language Are Not Bugs but Features for LLMs'
☆24May 20, 2025Updated last year
ruc-datalab / SC-prompt
View on GitHub
☆12May 13, 2023Updated 3 years ago
SALT-NLP / mic
View on GitHub
Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"
☆21Jul 18, 2023Updated 3 years ago
rycolab / kl-rb
View on GitHub
This repository contains code for the paper "Better Estimation of the KL Divergence Between Language Models"
☆19May 30, 2025Updated last year
Raphaaal / fieldy
View on GitHub
Fine-grained attention in hierarchical transformers for tabular time-series.
☆12Dec 24, 2024Updated last year