lechmazur/pact

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lechmazur/pact)

lechmazur / pact

A benchmark for conversational bargaining by language models. In each 20‑round match one LLM plays buyer, one plays seller, and both hold a hidden private value. Every round they swap a short public message, then post a bid or ask; a deal clears whenever the bid meets the ask.

☆44

Alternatives and similar repositories for pact

Users that are interested in pact are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lechmazur / writing_styles
View on GitHub
Documents the style side of the short-story Creative Writing LLM benchmark: we generated many short stories with a range of LLMs, then an…
☆25Dec 18, 2025Updated 7 months ago
lechmazur / persuasion
View on GitHub
LLM Persuasion Benchmark tests whether one language model can change another model’s stated position over the course of a multi-turn conv…
☆31Mar 27, 2026Updated 3 months ago
lechmazur / emergent_collusion
View on GitHub
Systemic, uninstructed collusion among frontier LLMs in a simulated bidding environment
☆18Jul 15, 2025Updated last year
lechmazur / step_game
View on GitHub
Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLM…
☆89Dec 9, 2025Updated 7 months ago
lechmazur / generalization
View on GitHub
Thematic Generalization Benchmark: measures how effectively various LLMs can infer a narrow or specific "theme" (category/rule) from a sm…
☆72Apr 16, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lechmazur / divergent
View on GitHub
LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each oth…
☆35Mar 20, 2025Updated last year
lechmazur / nyt-connections
View on GitHub
Benchmark that evaluates LLMs using 759 NYT Connections puzzles extended with extra trick words
☆230Jul 17, 2026Updated last week
lechmazur / confabulations
View on GitHub
Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.
☆247Aug 7, 2025Updated 11 months ago
lechmazur / writing
View on GitHub
This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, moti…
☆408Updated this week
lechmazur / position_bias
View on GitHub
A benchmark for testing whether LLM judges keep the same preference when two lightly edited versions of the same story are shown in oppos…
☆15Jun 11, 2026Updated last month
kobihackenburg / GPT-4-political-microtargeting
View on GitHub
Project repository for "Evaluating the persuasive influence of political microtargeting with large language models" by Kobi Hackenburg an…
☆11Jun 19, 2024Updated 2 years ago
lechmazur / elimination_game
View on GitHub
A multi-player tournament benchmark that tests LLMs in social reasoning, strategy, and deception. Players engage in public and private co…
☆301Jan 7, 2026Updated 6 months ago
jwest33 / latent_control_adapters
View on GitHub
Multi-vector latent space steering adapter module for language models
☆20Nov 22, 2025Updated 8 months ago
martianlantern / ThinkMesh
View on GitHub
This is a framework that implements various parallel reasoning strategies from the literature
☆275Dec 18, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
osome-iu / AI_fact_checking
View on GitHub
We conduct a preregistered experiment to investigate whether fact checks provided by a large language model can serve as an effective mis…
☆13Dec 14, 2024Updated last year
plloydsmith / rmdcev
View on GitHub
Implement MDCEV model in R using Stan
☆13Updated this week
yataoz / face_reenact_GDPW
View on GitHub
Code repository for the BMVC 2022 paper: Geometry Driven Progressive Warping for One Shot Face Animation
☆12Jan 6, 2023Updated 3 years ago
spenserhuang / messari-api-exploration
View on GitHub
Repo for the "Exploring Messari's Crypto API" article
☆10Dec 19, 2018Updated 7 years ago
cs231x / super-resolution-detection
View on GitHub
End-to-End Super Resolution Object Detection Networks
☆12Jun 8, 2018Updated 8 years ago
OPTML-Group / Unlearn-Trace
View on GitHub
[ICLR26] Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
☆24Apr 8, 2026Updated 3 months ago
lpuettmann / automation-patents
View on GitHub
Data from paper: "Benign Effects of Automation: New Evidence from Patent Texts"
☆15May 31, 2025Updated last year
MathAI-LAB / PTMDA
View on GitHub
This is the pytorch demo code for Multi-Source Unsupervised Domain Adaptation via Pseudo Target Domain, (PTMDA) (IEEE Transactions on Ima…
☆11Apr 15, 2022Updated 4 years ago
iceener / llm-tools-merger
View on GitHub
☆15Aug 24, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
limteng-rpi / mlmt
View on GitHub
Code for the paper "A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling" (ACL2018)
☆29Nov 6, 2019Updated 6 years ago
runchu-tian / LongPiBench
View on GitHub
The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"
☆14Dec 16, 2024Updated last year
summitgao / MSFMamba
View on GitHub
Code for "MSFMamba: Multi-Scale Feature Fusion State Space Model for Multi-Source Remote Sensing Image Classification"
☆10Aug 26, 2024Updated last year
ikantkode / qwen3-2b-ocr-app
View on GitHub
A simple streamlit app to play with qwen3-2b-VL to perform OCR. Dockerized set up, tested with 3060 12 GB.
☆32Nov 23, 2025Updated 8 months ago
jzhang538 / BadMerging
View on GitHub
[CCS 2024] "BadMerging: Backdoor Attacks Against Model Merging": official code implementation.
☆36Aug 22, 2024Updated last year
KristofferOlesen / Datasets-of-AIS-Trajectories-from-Danish-Waters
View on GitHub
Public filtered data sets of AIS Trajectories from Danish Waters. Data sets vary in ROI size, time period, included ship types ect. Some …
☆15Oct 23, 2023Updated 2 years ago
CogitatorTech / binharic-cli
View on GitHub
A multi-provider AI coding agent with the persona of a Tech-Priest
☆18Nov 1, 2025Updated 8 months ago
cloudflare / notebook-examples
View on GitHub
These examples demonstrate how to use the Cloudflare API within interactive Python notebooks.
☆25Jun 3, 2026Updated last month
TAR-ALEX / llm-html
View on GitHub
☆20Jul 4, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
lzhxmu / AccDiffusion
View on GitHub
Code release for AccDiffusion (ECCV 2024)
☆92Nov 19, 2024Updated last year
yuchenlwu / PersonalizedSafety
View on GitHub
[NeurIPS 2025]: Personalized Safety in LLMs — A Benchmark and a Planning-Based Agent Approach
☆17Oct 30, 2025Updated 8 months ago
tengmmvp / Seedream_MCP
View on GitHub
Doubao-Seedream生图MCP（即梦生图MCP）
☆16Jul 9, 2026Updated 2 weeks ago
graves / dirdocs
View on GitHub
Recursively generate descriptions of every file in a directory then append that description to Nushell's ls.
☆16Oct 7, 2025Updated 9 months ago
F2-Song / Weak-to-Strong-Decoding
View on GitHub
The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"
☆22Jun 26, 2025Updated last year
stammen / angleproject
View on GitHub
Angle Project (https://code.google.com/p/angleproject/) with support for Windows Store Apps (WinRT)
☆44May 2, 2014Updated 12 years ago
rdumasia303 / tensorrt-llm_with_open-webui
View on GitHub
A simple docker compose setup that works (in limited testing) on Blackwell cards
☆15Oct 13, 2025Updated 9 months ago