β32Jul 11, 2024Updated last year
Alternatives and similar repositories for AutoBencher
Users that are interested in AutoBencher are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- πΎ Universal, customizable and deployable fine-grained evaluation for text generation.β24Oct 26, 2023Updated 2 years ago
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)β26Feb 25, 2025Updated last year
- [ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Modelsβ66Mar 8, 2025Updated last year
- Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.β20Dec 25, 2023Updated 2 years ago
- Implementation of the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)"β50Jul 26, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- β13May 17, 2025Updated 10 months ago
- Exploring limitations of LLM-as-a-judgeβ20Aug 17, 2024Updated last year
- β23Mar 8, 2024Updated 2 years ago
- Measuring if attention is explanation with ROARβ22Mar 3, 2023Updated 3 years ago
- [ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.β128Aug 22, 2025Updated 7 months ago
- β31Nov 16, 2025Updated 5 months ago
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Modelsβ17Jun 28, 2025Updated 9 months ago
- Mental state inference from observable behaviorβ15Dec 3, 2021Updated 4 years ago
- β14Aug 30, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".β15Apr 27, 2023Updated 2 years ago
- β40Jan 26, 2025Updated last year
- JPEG-LM: LLMs as Image Generators with Canonical Codec Representationsβ15Sep 29, 2024Updated last year
- Code listing for the paper 'SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detecβ¦β10Nov 1, 2021Updated 4 years ago
- Code and instructions accompanying ICCV'23 paper Protoype-based Dataset Comparisonβ18Dec 15, 2023Updated 2 years ago
- β16Dec 14, 2023Updated 2 years ago
- The code implementation of the paper Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks (Aβ¦β13Jul 16, 2024Updated last year
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengthsβ18Jul 10, 2025Updated 9 months ago
- Code for the paper "Symmetric Machine Theory of Mind", presented at ICML 2022.β12Jul 18, 2022Updated 3 years ago
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- This repo contains the ToMnet+ model for preference inference. Developed by Yun-Shiuan, Edwinn, Hsin-Yi, and Elaine.β10Feb 24, 2023Updated 3 years ago
- Resolving Knowledge Conflicts in Large Language Models, COLM 2024β18Oct 7, 2025Updated 6 months ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generationβ14Aug 19, 2025Updated 7 months ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heβ¦β31Aug 25, 2023Updated 2 years ago
- β21Feb 10, 2025Updated last year
- β11Jan 3, 2024Updated 2 years ago
- This repository includes the code implementation of the paper Improving Pacing in Long-Form Story Planning by Yichen Wang, Kevin Yang, Xiβ¦β17Nov 19, 2024Updated last year
- β12May 13, 2023Updated 2 years ago
- [ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Modelsβ39Jul 19, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [ICML25] CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scaleβ25Jul 31, 2025Updated 8 months ago
- Headless Slay the Spire 2 CLI β play the full game from a terminal.β156Apr 1, 2026Updated 2 weeks ago
- Repository for the ACL 2024 conference websiteβ18Feb 3, 2025Updated last year
- This repository contains the joint use of CPO and SimPO method for better reference-free preference learning methods.β56Aug 13, 2024Updated last year
- β19Feb 3, 2022Updated 4 years ago
- Framework for unified summarisation and evaluation of English documents using state-of-the-art models and measures.β33May 13, 2024Updated last year
- β14Apr 16, 2024Updated 2 years ago