gregdurrett / fp-dataset-artifacts
Final project starter code for NLP classes at UT Austin: provides a HuggingFace trainer to enable studying of dataset artifacts.
☆18Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for fp-dataset-artifacts
- Codes for the paper The emergence of clusters in self-attention dynamics.☆12Updated 10 months ago
- ☆27Updated last year
- ☆63Updated last month
- A Random Matrix Approach to Extreme Learning Machine☆14Updated 6 years ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆62Updated last year
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆15Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆25Updated 5 months ago
- Evaluating LLMs with fewer examples☆134Updated 7 months ago
- The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".☆14Updated 4 months ago
- Lectures on NLP☆12Updated last year
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆26Updated last week
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆84Updated 8 months ago
- ☆156Updated last week
- ☆24Updated 2 months ago
- Google Research☆45Updated 2 years ago
- ☆18Updated last month
- ☆25Updated 4 months ago
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆17Updated 6 months ago
- Minimalist BERT implementation assignment for CS11-711☆79Updated 2 years ago
- ☆22Updated last year
- Official Repository for ICML 2023 paper "Can Neural Network Memorization Be Localized?"☆16Updated last year
- Improving Alignment and Robustness with Circuit Breakers☆152Updated last month
- Code for "The Expressive Power of Low-Rank Adaptation".☆18Updated 6 months ago
- ☆11Updated 2 years ago
- This repository holds code and other relevant files for the NeurIPS 2022 tutorial: Foundational Robustness of Foundation Models.☆70Updated last year
- Post-processing for fair classification☆11Updated last week
- Privacy backdoors☆47Updated 6 months ago
- ☆75Updated 4 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs?☆23Updated 5 months ago
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆15Updated 2 years ago