guidelabs / infembed
Find the samples, in the test data, on which your (generative) model makes mistakes.
☆26Updated 6 months ago
Alternatives and similar repositories for infembed:
Users that are interested in infembed are comparing it to the libraries listed below
- ☆28Updated last year
- A fast, effective data attribution method for neural networks in PyTorch☆204Updated 5 months ago
- The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"☆108Updated last year
- Repository for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits, accepted at CVPR 2024 XAI4CV Works…☆14Updated 10 months ago
- ☆22Updated 2 months ago
- ☆90Updated 2 months ago
- Data for "Datamodels: Predicting Predictions with Training Data"☆96Updated last year
- Steering vectors for transformer language models in Pytorch / Huggingface☆94Updated last month
- ☆21Updated 8 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆176Updated 5 months ago
- ☆41Updated last year
- NeuroSurgeon is a package that enables researchers to uncover and manipulate subnetworks within models in Huggingface Transformers☆41Updated 2 months ago
- Evaluate uncertainty, calibration, accuracy, and fairness of LLMs on real-world survey data!☆20Updated last week
- ☆36Updated 2 years ago
- Erasing concepts from neural representations with provable guarantees☆227Updated 2 months ago
- AI Logging for Interpretability and Explainability🔬☆111Updated 10 months ago
- ☆31Updated last year
- ☆67Updated last year
- Official Repository for Dataset Inference for LLMs☆33Updated 8 months ago
- ☆34Updated last year
- Adversarial Attacks on GPT-4 via Simple Random Search [Dec 2023]☆43Updated 11 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 10 months ago
- ☆121Updated last year
- Spurious Features Everywhere - Large-Scale Detection of Harmful Spurious Features in ImageNet☆30Updated last year
- Improving Alignment and Robustness with Circuit Breakers☆196Updated 6 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆167Updated this week
- ☆32Updated 4 months ago
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆62Updated last year
- ☆128Updated 2 weeks ago
- ☆83Updated last week