AlphaPav / mem-kk-logic
On Memorization of Large Language Models in Logical Reasoning
☆16Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for mem-kk-logic
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- ☆49Updated last year
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆65Updated last year
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆84Updated 5 months ago
- ☆38Updated last year
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆28Updated 4 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆57Updated 2 weeks ago
- Test-time-training on nearest neighbors for large language models☆27Updated 7 months ago
- ☆20Updated 4 months ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆30Updated 8 months ago
- [SafeGenAi @ NeurIPS 2024] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates☆60Updated 3 weeks ago
- ☆33Updated last year
- Röttger et al. (2023): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆64Updated 10 months ago
- ☆26Updated 3 weeks ago
- ☆31Updated last year
- ☆44Updated 10 months ago
- Codebase for decoding compressed trust.☆20Updated 6 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆71Updated last month
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆97Updated 2 months ago
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆17Updated 8 months ago
- Long Context Extension and Generalization in LLMs☆39Updated 2 months ago
- `dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.☆33Updated 2 weeks ago
- AI Logging for Interpretability and Explainability🔬☆89Updated 5 months ago
- [ICLR 2024] Provable Robust Watermarking for AI-Generated Text☆26Updated 11 months ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆15Updated 5 months ago
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆84Updated 8 months ago
- ☆23Updated 6 months ago
- ☆33Updated 9 months ago
- Official Repository for Dataset Inference for LLMs☆23Updated 3 months ago
- ☆41Updated last year