openai / democratic-inputs
☆56Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for democratic-inputs
- ☆76Updated 10 months ago
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆72Updated 10 months ago
- ☆100Updated 3 months ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- ☆239Updated 4 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆62Updated last year
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆98Updated 4 months ago
- Track the progress of LLM context utilisation☆53Updated 3 months ago
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆201Updated 6 months ago
- ☆66Updated last week
- Just a bunch of benchmark logs for different LLMs☆114Updated 3 months ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆38Updated last week
- ☆30Updated 4 months ago
- The Foundation Model Transparency Index☆71Updated 5 months ago
- ☆86Updated 5 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆73Updated 2 months ago
- A repository of prompts and Python scripts for intelligent transformation of raw text into diverse formats.☆29Updated last year
- Sphynx Hallucination Induction☆47Updated 3 months ago
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆99Updated this week
- Automating enterprise workflows with multimodal agents☆94Updated last month
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆186Updated last week
- Evaluating LLMs with CommonGen-Lite☆84Updated 7 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆97Updated 7 months ago
- Collection of recipes aiding Gen AI model development☆83Updated this week
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆90Updated this week
- A framework for generative software.☆89Updated last week
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆109Updated 4 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆112Updated last year
- ☆38Updated 3 months ago
- Build hours code to share.☆129Updated this week