Phylliida / OpenClioLinks

Open source version of Anthropic's Clio: A system for privacy-preserving insights into real-world AI use

☆47

Alternatives and similar repositories for OpenClio

Users that are interested in OpenClio are comparing it to the libraries listed below

Sorting:

haizelabs / verdict
Inference-time scaling for LLMs-as-a-judge.
☆302Updated 2 weeks ago
TransluceAI / observatory
A toolkit for describing model features and intervening on those features to steer behavior.
☆207Updated 11 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆161Updated 5 months ago
princeton-pli / hal-harness
☆167Updated this week
google-deepmind / mishax
☆142Updated last month
haizelabs / Awesome-LLM-Judges
⚖️ Awesome LLM Judges ⚖️
☆131Updated 5 months ago
emergent-misalignment / emergent-misalignment
☆218Updated 7 months ago
UKGovernmentBEIS / inspect_evals
Collection of evals for Inspect AI
☆254Updated this week
METR / RE-Bench
☆112Updated this week
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆286Updated 2 weeks ago
METR / vivaria
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
☆116Updated last week
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆34Updated 6 months ago
METR / task-standard
METR Task Standard
☆163Updated 8 months ago
Ziems / arbor
A framework for optimizing DSPy programs with RL
☆202Updated this week
safety-research / persona_vectors
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
☆258Updated 2 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆187Updated 7 months ago
ScalingIntelligence / codemonkeys
☆57Updated 8 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆151Updated 8 months ago
allenai / infinigram-api
☆80Updated this week
KihoPark / LLM_Categorical_Hierarchical_Representations
☆109Updated 8 months ago
PrimeIntellect-ai / prime-environments
Training-Ready RL Environments + Evals
☆128Updated this week
SALT-NLP / collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
☆102Updated 2 weeks ago
centerforaisafety / emergent-values
Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"
☆55Updated 7 months ago
zbambergerNLP / strategic-debate-tot
A DSPy-based implementation of the tree of thoughts method (Yao et al., 2023) for generating persuasive arguments
☆89Updated 2 weeks ago
allenai / discoverybench
Discovering Data-driven Hypotheses in the Wild
☆113Updated 4 months ago
vinid / NegotiationArena
☆77Updated last year
PrimeIntellect-ai / genesys
☆135Updated 7 months ago
anthropics / evals
☆304Updated last year
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆89Updated last year
giorgiopiatti / GovSim
Governance of the Commons Simulation (GovSim)
☆59Updated 9 months ago