decoding-comp-trust / comp-trust
Codebase for decoding compressed trust.
☆20Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for comp-trust
- [ATTRIB @ NeurIPS 2024] When Attention Sink Emerges in Language Models: An Empirical View☆29Updated last month
- ☆38Updated last year
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆28Updated 4 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆84Updated 5 months ago
- ☆33Updated last year
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆36Updated 5 months ago
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆60Updated 3 weeks ago
- Official code for the paper: Evaluating Copyright Takedown Methods for Language Models☆15Updated 4 months ago
- This is the official implementation of ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting☆14Updated 3 months ago
- ☆20Updated last year
- Multilingual safety benchmark for Large Language Models☆24Updated 2 months ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆15Updated 6 months ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆15Updated 5 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆29Updated 3 weeks ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- ☆35Updated 4 months ago
- Long Context Extension and Generalization in LLMs☆39Updated 2 months ago
- Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆59Updated last month
- EMNLP 2024: Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue☆33Updated this week
- ☆16Updated 3 weeks ago
- Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"☆55Updated last year
- Lightweight tool to identify Data Contamination in LLMs evaluation☆42Updated 8 months ago
- ☆17Updated this week
- Min-K%++: Improved baseline for detecting pre-training data of LLMs https://arxiv.org/abs/2404.02936☆26Updated 5 months ago
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆49Updated last week
- Official Repository for Dataset Inference for LLMs