vdlad / Remarkable-Robustness-of-LLMs
Codebase the paper "The Remarkable Robustness of LLMs: Stages of Inference?"
☆16Updated 6 months ago
Alternatives and similar repositories for Remarkable-Robustness-of-LLMs:
Users that are interested in Remarkable-Robustness-of-LLMs are comparing it to the libraries listed below
- ☆23Updated last month
- ☆67Updated 5 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆28Updated 2 months ago
- ☆48Updated 11 months ago
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆35Updated last year
- The repository contains code for Adaptive Data Optimization☆21Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆53Updated 4 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆31Updated 2 months ago
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated 10 months ago
- Monet: Mixture of Monosemantic Experts for Transformers☆43Updated this week
- ☆46Updated 2 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago
- Evaluation of neuro-symbolic engines☆34Updated 5 months ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆40Updated 3 weeks ago
- Codebase accompanying the Summary of a Haystack paper.☆75Updated 3 months ago
- Minimum Description Length probing for neural network representations☆18Updated last week
- Finding semantically meaningful and accurate prompts.☆46Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆68Updated last month
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆97Updated 3 months ago
- This is the repository for the CONFLARE (CONformal LArge language model REtrieval) Python package.☆17Updated 9 months ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment☆38Updated last year
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆95Updated 9 months ago
- ☆39Updated 3 years ago
- Learning to route instances for Human vs AI Feedback☆16Updated this week
- TARGET is a benchmark for evaluating Table Retrieval for Generative Tasks such as Fact Verification and Text-to-SQL☆17Updated this week
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆64Updated 7 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 7 months ago
- A mechanistic approach for understanding and detecting factual errors of large language models.☆39Updated 6 months ago
- ☆46Updated 6 months ago