openai / democratic-inputs
☆66Updated 2 weeks ago
Alternatives and similar repositories for democratic-inputs:
Users that are interested in democratic-inputs are comparing it to the libraries listed below
- ☆90Updated last month
- ☆26Updated 10 months ago
- Sphynx Hallucination Induction☆53Updated 2 months ago
- An Open Source Playground with Agent Datasets and APIs for building and testing your own Autonomous Web Agents☆191Updated last year
- Just a bunch of benchmark logs for different LLMs☆119Updated 9 months ago
- ☆266Updated 9 months ago
- Evaluating LLMs with CommonGen-Lite☆89Updated last year
- OpenPipe ART (Agent Reinforcement Trainer): train LLM agents☆123Updated this week
- never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…☆37Updated 11 months ago
- Public Inflection Benchmarks☆68Updated last year
- Functional Benchmarks and the Reasoning Gap☆85Updated 6 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆139Updated this week
- Verdict is a library for scaling judge-time compute.☆199Updated last week
- A distributed agent orchestration framework for market agents☆88Updated this week
- Finetune Llama-3-8b on the MathInstruct dataset☆110Updated 6 months ago
- Turing machines, Rule 110, and A::B reversal using Claude 3 Opus.☆59Updated 11 months ago
- Track the progress of LLM context utilisation☆54Updated 2 weeks ago
- ☆137Updated last month
- A repository of prompts and Python scripts for intelligent transformation of raw text into diverse formats.☆30Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆74Updated last year
- Repository for the paper Stream of Search: Learning to Search in Language☆145Updated 2 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆189Updated 4 months ago
- Tutorial for building LLM router☆194Updated 9 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆90Updated 3 months ago
- Collection of evals for Inspect AI☆117Updated this week
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆46Updated 2 months ago
- This repository explains and provides examples for "concept anchoring" in GPT4.☆72Updated last year
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆43Updated 5 months ago
- ☆153Updated 9 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆99Updated last year