β32Jul 11, 2024Updated last year
Alternatives and similar repositories for AutoBencher
Users that are interested in AutoBencher are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- πΎ Universal, customizable and deployable fine-grained evaluation for text generation.β24Apr 22, 2026Updated last month
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)β27Feb 25, 2025Updated last year
- Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.β20Dec 25, 2023Updated 2 years ago
- Implementation of the paper "FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations (NAACL 2022)"β52Jul 26, 2023Updated 2 years ago
- Exploring limitations of LLM-as-a-judgeβ20Aug 17, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- β24Mar 8, 2024Updated 2 years ago
- The implementation of <Factual Consistency Evaluation for Text Summarization via Counterfactual Estimation> in PyTorch.β17Nov 11, 2021Updated 4 years ago
- Measuring if attention is explanation with ROARβ22Mar 3, 2023Updated 3 years ago
- [ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.β131Aug 22, 2025Updated 9 months ago
- β34Nov 16, 2025Updated 6 months ago
- β14Oct 7, 2024Updated last year
- ScienceMeter: Tracking Scientific Knowledge Updates in Language Modelsβ17Jun 28, 2025Updated 11 months ago
- Simple phoenix setup for padded window managementβ13Apr 25, 2018Updated 8 years ago
- Mental state inference from observable behaviorβ15Dec 3, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- β14Aug 30, 2023Updated 2 years ago
- Code used to run experiments for the ICLR 2023 paper "Computational Language Acquisition with Theory of Mind".β15Apr 27, 2023Updated 3 years ago
- The example of correspondence between fine classes and superclasses (coarse classes) in ImageNet.β13Dec 4, 2024Updated last year
- Code listing for the paper 'SATAR: A Self-supervised Approach to Twitter Account Representation Learning and its Application in Bot Detecβ¦β10Nov 1, 2021Updated 4 years ago
- Code and instructions accompanying ICCV'23 paper Protoype-based Dataset Comparisonβ18Dec 15, 2023Updated 2 years ago
- β16Dec 14, 2023Updated 2 years ago
- The code implementation of the paper Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks (Aβ¦β13Jul 16, 2024Updated last year
- Check your grade automatically and send e-mail when new grade comesβ12Feb 7, 2018Updated 8 years ago
- Repository for the COLM 2025 paper SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengthsβ19Jul 10, 2025Updated 11 months ago
- End-to-end encrypted cloud storage - Proton Drive β’ AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Resolving Knowledge Conflicts in Large Language Models, COLM 2024β18Oct 7, 2025Updated 8 months ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generationβ14Aug 19, 2025Updated 9 months ago
- β13Jul 25, 2023Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found heβ¦β31Aug 25, 2023Updated 2 years ago
- β21Feb 10, 2025Updated last year
- β11Jan 3, 2024Updated 2 years ago
- β12May 13, 2023Updated 3 years ago
- [ACL'24] A Knowledge-grounded Interactive Evaluation Framework for Large Language Modelsβ40Jul 19, 2024Updated last year
- This repository contains the joint use of CPO and SimPO method for better reference-free preference learning methods.β58Aug 13, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This repository includes the code implementation of the paper Improving Pacing in Long-Form Story Planning by Yichen Wang, Kevin Yang, Xiβ¦β17Nov 19, 2024Updated last year
- β19Feb 3, 2022Updated 4 years ago
- β14Apr 16, 2024Updated 2 years ago
- Kubernetes Tutorial for the PS2 group meetings at UC Berkeleyβ16Mar 23, 2023Updated 3 years ago
- Code accompanying ICML 2021 paper "Few-shot Language Coordination by Modeling Theory of Mind"β18May 18, 2022Updated 4 years ago
- β23Jan 25, 2023Updated 3 years ago
- [NeurIPS 2023] PyTorch code for Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mindβ66Dec 21, 2023Updated 2 years ago