sail-sg / lm-random-memory-access
☆15Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for lm-random-memory-access
- ☆24Updated last year
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆28Updated 4 months ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆15Updated 6 months ago
- ☆44Updated 10 months ago
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆15Updated 8 months ago
- ☆16Updated 4 months ago
- ☆26Updated 6 months ago
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆47Updated last month
- Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)☆21Updated 4 months ago
- ☆24Updated 6 months ago
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆28Updated 4 months ago
- ☆36Updated last year
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆45Updated 7 months ago
- ☆20Updated 4 months ago
- ☆21Updated last month
- ☆22Updated this week
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆29Updated 3 weeks ago
- Restore safety in fine-tuned language models through task arithmetic☆26Updated 7 months ago
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)☆12Updated last year
- Methods and evaluation for aligning language models temporally☆24Updated 8 months ago
- Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆10Updated 5 months ago
- ☆33Updated 9 months ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆14Updated last year
- Official code for paper Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation☆16Updated 8 months ago
- Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Le…☆68Updated 8 months ago
- [EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…☆20Updated last month
- `dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.☆32Updated 2 weeks ago
- ☆40Updated 11 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆54Updated 2 weeks ago
- The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…☆13Updated 9 months ago