yihedeng9 / DuoGuard
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails
☆21Updated 2 months ago
Alternatives and similar repositories for DuoGuard:
Users that are interested in DuoGuard are comparing it to the libraries listed below
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆25Updated 5 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- ☆20Updated 6 months ago
- ☆17Updated 4 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆47Updated 4 months ago
- Exploration of automated dataset selection approaches at large scales.☆39Updated 2 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆62Updated 6 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- ☆29Updated 4 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆51Updated 11 months ago
- ☆14Updated last year
- ☆20Updated 2 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆51Updated 2 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆60Updated 4 months ago
- Revisiting Mid-training in the Era of RL Scaling☆35Updated last week
- ☆11Updated 10 months ago
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆60Updated 9 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆74Updated 10 months ago
- ☆22Updated 4 months ago
- ☆62Updated last month
- ☆22Updated 10 months ago
- ☆16Updated 9 months ago
- AbstainQA, ACL 2024☆25Updated 6 months ago
- Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆37Updated last month
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆36Updated 11 months ago
- Towards Systematic Measurement for Long Text Quality☆34Updated 8 months ago
- ☆50Updated 2 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆21Updated 2 months ago
- ☆28Updated 6 months ago