A lightweight library for Bayesian analysis of LLM evals (ICML 2025 Spotlight Position Paper)
☆25May 28, 2025Updated 11 months ago
Alternatives and similar repositories for bayes_evals
Users that are interested in bayes_evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Portfolio REgret for Confidence SEquences☆21Jan 6, 2026Updated 4 months ago
- Oak National Academy's AI Auto Eval tools provide LLM as a judge evaluation on lesson plans and resources☆17Nov 4, 2025Updated 6 months ago
- Teaching Models to Express Their Uncertainty in Words☆38May 26, 2022Updated 3 years ago
- Bayesian Low-Rank Adaptation for Large Language Models☆40Jun 22, 2024Updated last year
- Stochastic trace estimation using JAX☆17Aug 20, 2025Updated 8 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An R package for simulating line lists☆11Updated this week
- Use PaliGemma to auto-label data for use in training fine-tuned vision models.☆12Jun 13, 2024Updated last year
- Materials for a 'Python for Science' bootcamp workshop.☆13Sep 23, 2018Updated 7 years ago
- Files required to follow along the introduction session to machine learning with sklearn and nilearn☆12Jun 18, 2020Updated 5 years ago
- Streamlit Multi AI Platform Chat App☆10Nov 5, 2024Updated last year
- Neuroproc dataset descriptions and dictionaries☆16Jan 2, 2017Updated 9 years ago
- R code to replicate analyses in Clark et al 2025 (Beyond single-species models: leveraging multispecies forecasts to navigate the dynamic…☆15Feb 19, 2025Updated last year
- Run LLMs on Replicate with vLLM☆26Jul 19, 2025Updated 9 months ago
- Nilearn tutorials for OHBM 2016 educational course☆13Jul 13, 2016Updated 9 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆16Oct 25, 2022Updated 3 years ago
- Implicit Deep Adaptive Design (iDAD): Policy-Based Experimental Design without Likelihoods☆23Dec 30, 2021Updated 4 years ago
- Code for the "Long Context Needs Some R&R" paper.☆12Mar 11, 2024Updated 2 years ago
- Utility provides a more meaningful measure of forecast skill than goodness-of-fit☆18Apr 26, 2022Updated 4 years ago
- Reverse engineering of Metacognition toolbox☆19Apr 1, 2026Updated last month
- Vincent, B. T. (2015) A tutorial on Bayesian models of Perception, Journal of Mathematical Psychology.☆14Oct 17, 2017Updated 8 years ago
- A simple application to build Trip-lets☆10Mar 30, 2020Updated 6 years ago
- A simple R package for calculating (meta-) SDT measures☆19Feb 5, 2024Updated 2 years ago
- A simple plugin for syncing movies from IMDb to Obsidian☆17May 8, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Examples on INLA within MCMC☆12May 15, 2017Updated 8 years ago
- ☆10Apr 2, 2024Updated 2 years ago
- ☆18Dec 13, 2023Updated 2 years ago
- PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations☆12Apr 21, 2024Updated 2 years ago
- ☆29Sep 19, 2025Updated 7 months ago
- ☆11Nov 27, 2019Updated 6 years ago
- Notebooks to demonstrate TimmWrapper☆16Jan 16, 2025Updated last year
- MATLAB model of the auditory periphery☆17Nov 28, 2011Updated 14 years ago
- ☆13Dec 2, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Code related to the paper "Time series classification with random convolution kernels: pooling operators and input representations matter…☆15Jan 14, 2026Updated 3 months ago
- statistical models to analyze diagnostic tests☆16Nov 19, 2020Updated 5 years ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆59Sep 20, 2024Updated last year
- Magnitude achieves SOTA 94% on WebVoyager benchmark☆37Jul 7, 2025Updated 10 months ago
- A research project exploring fine-tuning BERT-style models for text generation☆40Nov 30, 2025Updated 5 months ago
- Estimate epidemiological delay distributions with brms☆15Apr 13, 2026Updated 3 weeks ago
- A project comparing the implementations of a basic AI agent using Langchain and PydanticAI frameworks☆18Jan 27, 2025Updated last year