l294265421 / ChatGPT-Techniques-Introduction-for-Everyone
ChatGPT技术介绍
☆18Updated last year
Related projects: ⓘ
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆23Updated 9 months ago
- ☆21Updated this week
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆24Updated 6 months ago
- ☆10Updated 3 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆49Updated 3 months ago
- ☆18Updated 3 months ago
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards"☆22Updated 2 months ago
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆28Updated 8 months ago
- AI Alignment: A Comprehensive Survey☆123Updated 10 months ago
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆33Updated this week
- [NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.☆23Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆89Updated 2 months ago
- CS 294-112 @ UCB Deep RL☆22Updated last year
- Self-Knowledge Guided Retrieval Augmentation for Large Language Models (EMNLP Findings 2023)☆20Updated 9 months ago
- ☆15Updated 2 months ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models☆33Updated 9 months ago
- Official implementation of "Direct Preference-based Policy Optimization without Reward Modeling" (NeurIPS 2023)☆37Updated 2 months ago
- Feeling confused about super alignment? Here is a reading list☆42Updated 8 months ago
- ☆23Updated 5 months ago
- 用RLHF可选LoRA对LLaMA和MOSS进行训练|Training LLaMA or MOSS with RLHF [LoRA]☆20Updated last year
- ☆52Updated 2 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆84Updated 5 months ago
- The information of NLP PhD application in the world.☆34Updated 3 weeks ago
- ☆25Updated 7 months ago
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆26Updated 6 months ago
- Dateset Reset Policy Optimization☆27Updated 5 months ago
- ☆80Updated 9 months ago
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆21Updated last year
- [ACL'2024] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆44Updated last month