saffronh / ccai
Data processing for the Collective Constitutional AI project (a collaboration between The Collective Intelligence Project & Anthropic)
☆21Updated last year
Alternatives and similar repositories for ccai:
Users that are interested in ccai are comparing it to the libraries listed below
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆47Updated last month
- Functional Benchmarks and the Reasoning Gap☆82Updated 3 months ago
- Just a bunch of benchmark logs for different LLMs☆117Updated 6 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆64Updated 7 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- ☆40Updated 3 months ago
- ☆28Updated last year
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆108Updated last year
- Evaluating LLMs with fewer examples☆141Updated 9 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆53Updated 5 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆20Updated last month
- ☆27Updated last month
- Evaluating LLMs with CommonGen-Lite☆88Updated 10 months ago
- For experiments involving instruct gpt. Currently used for documenting open research questions.☆71Updated 2 years ago
- Official code for the paper "ADaPT: As-Needed Decomposition and Planning with Language Models"☆74Updated last year
- Mixing Language Models with Self-Verification and Meta-Verification☆100Updated last month
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆40Updated 10 months ago
- Code repository for the c-BTM paper☆105Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆156Updated 3 months ago
- Track the progress of LLM context utilisation☆53Updated 6 months ago
- Replicating O1 inference-time scaling laws☆73Updated last month
- Code accompanying "How I learned to start worrying about prompt formatting".☆100Updated 3 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆116Updated last year
- This is the official repository of the paper "OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI"☆91Updated last month
- ☆81Updated last year
- ☆47Updated 2 months ago
- Public Inflection Benchmarks☆69Updated 10 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago
- LLMs as Collaboratively Edited Knowledge Bases☆43Updated 11 months ago
- Comparing retrieval abilities from GPT4-Turbo and a RAG system on a toy example for various context lengths☆35Updated last year