collinzrj / output2prompt
☆32Updated last month
Related projects: ⓘ
- Lightweight tool to identify Data Contamination in LLMs evaluation☆39Updated 6 months ago
- Weak-to-Strong Jailbreaking on Large Language Models☆62Updated 6 months ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆36Updated 3 months ago
- ☆32Updated 10 months ago
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆73Updated 7 months ago
- ☆37Updated 10 months ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents☆57Updated last month
- ☆16Updated 6 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆109Updated 11 months ago
- ConceptVectors Benchmark and Code for the paper "Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces"☆14Updated 2 weeks ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆79Updated 3 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆81Updated last month
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆82Updated 2 months ago
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆105Updated last year
- ☆44Updated 2 weeks ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆39Updated 7 months ago
- Knowledge Circuits in Pretrained Transformers☆46Updated this week
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆55Updated 8 months ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…☆34Updated 2 months ago
- Min-K%++: Improved baseline for detecting pre-training data of LLMs https://arxiv.org/abs/2404.02936☆25Updated 3 months ago
- The Prism Alignment Project☆32Updated 4 months ago
- ☆143Updated 9 months ago
- Codebase for Inference-Time Policy Adapters☆19Updated 10 months ago
- Does Refusal Training in LLMs Generalize to the Past Tense? [arXiv, July 2024]☆49Updated 2 months ago
- ☆31Updated 3 months ago
- ☆30Updated last year
- Implementation of the paper "Exploring the Universal Vulnerability of Prompt-based Learning Paradigm" on Findings of NAACL 2022☆26Updated 2 years ago
- Improving Alignment and Robustness with Circuit Breakers☆124Updated 2 months ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (paragraph-length experiments).☆38Updated 5 months ago
- We have released the code and demo program required for LLM with self-verification☆45Updated 11 months ago