HannahKirk / prism-alignment
The Prism Alignment Project
☆37Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for prism-alignment
- ☆94Updated 6 months ago
- Inspecting and Editing Knowledge Representations in Language Models☆107Updated last year
- ☆33Updated last year
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆61Updated 10 months ago
- ☆44Updated 2 months ago
- ☆21Updated 8 months ago
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆42Updated 2 months ago
- This repository contains data, code and models for contextual noncompliance.☆18Updated 3 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆48Updated 8 months ago
- Code/data for MARG (multi-agent review generation)☆30Updated 5 months ago
- AbstainQA, ACL 2024☆19Updated 3 weeks ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆62Updated last year
- ☆36Updated 3 months ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆96Updated last year
- ☆61Updated 7 months ago
- Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"☆30Updated 2 months ago
- Implementation of the Paper "Goal-Driven Explainable Clustering via Language Descriptions"☆35Updated last year
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆54Updated last year
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆29Updated 8 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆78Updated 2 months ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆29Updated 8 months ago
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆84Updated 3 years ago
- ☆34Updated 2 years ago
- Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Le…☆68Updated 7 months ago
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆39Updated 4 months ago
- Repo accompanying our paper "Do Llamas Work in English? On the Latent Language of Multilingual Transformers".☆56Updated 7 months ago
- Token-level Reference-free Hallucination Detection☆92Updated last year
- ☆40Updated 11 months ago
- ☆76Updated last year
- Repository for the Bias Benchmark for QA dataset.☆84Updated 10 months ago