gregdurrett / fp-dataset-artifacts
Final project starter code for NLP classes at UT Austin: provides a HuggingFace trainer to enable studying of dataset artifacts.
☆22Updated 4 months ago
Alternatives and similar repositories for fp-dataset-artifacts:
Users that are interested in fp-dataset-artifacts are comparing it to the libraries listed below
- ☆103Updated 10 months ago
- ☆18Updated 2 years ago
- A Synthetic Dataset for Personal Attribute Inference (NeurIPS'24 D&B)☆36Updated 3 months ago
- ☆81Updated 5 months ago
- Privacy backdoors☆51Updated 10 months ago
- A toolkit for optimizing machine learning models for practical applications☆26Updated last week
- DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models (ICLR 2024)☆63Updated 5 months ago
- ☆78Updated last year
- PAIR.withgoogle.com and friend's work on interpretability methods☆170Updated last month
- ☆196Updated 4 years ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 9 months ago
- A set of scripts and notebooks on LLM finetunning and dataset creation☆105Updated 5 months ago
- Aligning AI With Shared Human Values (ICLR 2021)☆278Updated last year
- Training data extraction on GPT-2☆180Updated 2 years ago
- Data and code for the preprint "In-Context Learning with Long-Context Models: An In-Depth Exploration"☆31Updated 6 months ago
- Improving Alignment and Robustness with Circuit Breakers☆189Updated 5 months ago
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆19Updated 10 months ago
- Official Repository for Dataset Inference for LLMs☆32Updated 7 months ago
- Implementation of PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆33Updated 4 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆148Updated last year
- Finding semantically meaningful and accurate prompts.☆46Updated last year
- ☆14Updated 11 months ago
- Evaluating LLMs with fewer examples☆147Updated 11 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆69Updated this week
- ☆11Updated last year
- In-context Example Selection with Influences☆15Updated last year
- A mechanistic approach for understanding and detecting factual errors of large language models.☆41Updated 8 months ago
- ☆38Updated last year
- [TACL] Code for "Red Teaming Language Model Detectors with Language Models"☆19Updated last year
- A Survey of Hallucination in Large Foundation Models☆54Updated last year