thu-coai / BPOView external linksLinks
☆334Jun 24, 2024Updated last year
Alternatives and similar repositories for BPO
Users that are interested in BPO are comparing it to the libraries listed below
Sorting:
- Unofficial implementation of Chain of Hindsight (https://arxiv.org/abs/2302.02676) using pytorch and huggingface Trainers.☆11Apr 5, 2023Updated 2 years ago
- ☆313Jun 9, 2024Updated last year
- ☆16Jul 23, 2024Updated last year
- [ACL 2024] Progressive LLaMA with Block Expansion.☆514May 20, 2024Updated last year
- AgentTuning: Enabling Generalized Agent Abilities for LLMs☆1,477Oct 31, 2023Updated 2 years ago
- chatglm多gpu用deepspeed和☆409Jul 8, 2024Updated last year
- [NIPS2023] RRHF & Wombat☆808Sep 22, 2023Updated 2 years ago
- The official codes for "Aurora: Activating chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning"☆264May 9, 2024Updated last year
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆445Oct 16, 2024Updated last year
- Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)☆2,696Aug 14, 2024Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆416Jun 25, 2025Updated 7 months ago
- Yuan 2.0 Large Language Model☆689Jul 11, 2024Updated last year
- ☆84Apr 18, 2024Updated last year
- 🩹Editing large language models within 10 seconds⚡☆1,361Aug 13, 2023Updated 2 years ago
- ☆1,338Apr 29, 2024Updated last year
- A series of large language models developed by Baichuan Intelligent Technology☆4,118Nov 8, 2024Updated last year
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Feb 29, 2024Updated last year
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,897Jan 21, 2024Updated 2 years ago
- [Preprint] Learning to Filter Context for Retrieval-Augmented Generaton☆196Apr 6, 2024Updated last year
- Secrets of RLHF in Large Language Models Part I: PPO☆1,416Mar 3, 2024Updated last year
- LongQLoRA: Extent Context Length of LLMs Efficiently☆168Nov 12, 2023Updated 2 years ago
- ☆147Jul 1, 2024Updated last year
- ACL24☆11Jun 7, 2024Updated last year
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆588Dec 9, 2024Updated last year
- A large-scale 7B pretraining language model developed by BaiChuan-Inc.☆5,686Jul 18, 2024Updated last year
- Codebase for Merging Language Models (ICML 2024)☆864May 5, 2024Updated last year
- The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.☆446Oct 11, 2024Updated last year
- An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)☆8,989Feb 6, 2026Updated last week
- Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…☆354Jun 18, 2023Updated 2 years ago
- ☆46Jun 11, 2025Updated 8 months ago
- ☆147Apr 16, 2024Updated last year
- Generative Judge for Evaluating Alignment☆250Jan 18, 2024Updated 2 years ago
- Official repository for ORPO☆471May 31, 2024Updated last year
- Reference implementation for DPO (Direct Preference Optimization)☆2,850Aug 11, 2024Updated last year
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆191Jan 16, 2025Updated last year
- Benchmarking long-form factuality in large language models. Original code for our paper "Long-form factuality in large language models".☆665Feb 5, 2026Updated last week
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models☆1,894Jan 16, 2024Updated 2 years ago
- ☆970Jan 23, 2025Updated last year
- ☆79Dec 15, 2023Updated 2 years ago