openai / emergent-misalignment-persona-featuresLinks
☆30Updated 2 months ago
Alternatives and similar repositories for emergent-misalignment-persona-features
Users that are interested in emergent-misalignment-persona-features are comparing it to the libraries listed below
Sorting:
- Open Source Replication of Anthropic's Alignment Faking Paper☆50Updated 4 months ago
- ☆19Updated 3 weeks ago
- A curated list of materials on AI guardails☆40Updated 2 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆88Updated this week
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 4 months ago
- Official Repo for CRMArena and CRMArena-Pro☆109Updated 2 months ago
- ☆119Updated 2 weeks ago
- Small, simple agent task environments for training and evaluation☆18Updated 9 months ago
- Red-Teaming Language Models with DSPy☆212Updated 6 months ago
- Public repository containing METR's DVC pipeline for eval data analysis☆104Updated 4 months ago
- ☆78Updated last week
- ☆56Updated 2 months ago
- A method for steering llms to better follow instructions☆49Updated 3 weeks ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- Automatic Prompt Optimization☆40Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆179Updated 5 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆74Updated 5 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆75Updated 8 months ago
- Leveraging Base Language Models for Few-Shot Synthetic Data Generation☆33Updated last month
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.☆127Updated this week
- ☆54Updated 9 months ago
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆54Updated last month
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated 4 months ago
- Large Language Model (LLM) powered evaluator for Retrieval Augmented Generation (RAG) pipelines.☆30Updated last year
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆55Updated last month
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆168Updated 2 weeks ago
- A better way of testing, inspecting, and analyzing AI Agent traces.☆40Updated last month
- Learning to route instances for Human vs AI Feedback (ACL 2025 Main)☆23Updated last month
- BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated c…☆37Updated 4 months ago
- ☆45Updated 3 months ago