This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"
☆17Feb 22, 2024Updated 2 years ago
Alternatives and similar repositories for safer-instruct
Users that are interested in safer-instruct are comparing it to the libraries listed below
Sorting:
- Self-Supervised Alignment with Mutual Information☆20May 24, 2024Updated last year
- Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…☆14Jan 19, 2024Updated 2 years ago
- ☆13Jan 22, 2025Updated last year
- Dateset Reset Policy Optimization☆31Apr 12, 2024Updated last year
- ☆13Jun 4, 2024Updated last year
- Understanding the correlation between different LLM benchmarks☆29Jan 11, 2024Updated 2 years ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆33May 9, 2024Updated last year
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Feb 29, 2024Updated 2 years ago
- a set of scripts to easily convert all training data from huggingface into alpaca instruct or sharegpt format, which should allow for eas…☆18Mar 14, 2025Updated 11 months ago
- ☆36Jul 7, 2025Updated 7 months ago
- ☆16Jul 23, 2024Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Jun 3, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- [ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities☆28Apr 2, 2025Updated 11 months ago
- ☆46Jun 11, 2025Updated 8 months ago
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆53Aug 10, 2025Updated 6 months ago
- Scaling Long-Horizon LLM Agent via Context-Folding☆117Jan 26, 2026Updated last month
- Aioli: A unified optimization framework for language model data mixing☆32Jan 17, 2025Updated last year
- Code and data for "ImgTrojan: Jailbreaking Vision-Language Models with ONE Image"☆24Mar 26, 2025Updated 11 months ago
- ☆51Oct 28, 2024Updated last year
- ☆160Nov 23, 2024Updated last year
- ☆23Aug 7, 2023Updated 2 years ago
- Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"☆102Jul 9, 2024Updated last year
- ☆27Aug 30, 2023Updated 2 years ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Feb 22, 2024Updated 2 years ago
- The Code and Script of "David's Slingshot: A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis"☆34Jun 13, 2025Updated 8 months ago
- A list of papers about data quality in Large Language Models (LLMs)☆27Dec 14, 2023Updated 2 years ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆73Jun 25, 2024Updated last year
- ☆32Jul 8, 2024Updated last year
- ☆17Sep 1, 2024Updated last year
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆28Dec 19, 2023Updated 2 years ago
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆32Jul 9, 2024Updated last year
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆37May 31, 2025Updated 9 months ago
- ☆30Feb 16, 2024Updated 2 years ago
- never forget anything again! combine AI and intelligent tooling for a local knowledge base to track catalogue, annotate, and plan for you…☆37May 14, 2024Updated last year
- ☆32Jun 5, 2025Updated 8 months ago
- ☆313Jun 9, 2024Updated last year
- ☆13Apr 27, 2021Updated 4 years ago
- ☆34Sep 14, 2024Updated last year