jys5609 / GPT-Critic
GPT-Critic: Offline Reinforcement Learning for End-to-End Task-Oriented Dialogue Systems
☆10Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for GPT-Critic
- Code and data for "Inferring Rewards from Language in Context" [ACL 2022].☆15Updated 2 years ago
- ☆41Updated last year
- 😜Constrative Learning of Sentence Embedding using LoRA (EECS487 final project)☆12Updated last year
- Code for Paper (Policy Optimization in RLHF: The Impact of Out-of-preference Data)☆23Updated 10 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆29Updated 3 weeks ago
- ☆24Updated last year
- ☆51Updated last year
- ☆22Updated last year
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆51Updated last year
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆24Updated 8 months ago
- ☆14Updated 2 years ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆14Updated last year
- GenRM-CoT: Data release for verification rationales☆15Updated 3 weeks ago
- ☆28Updated 8 months ago
- Self-Supervised Alignment with Mutual Information☆14Updated 5 months ago
- Learning adapter weights from task descriptions☆15Updated last year
- [EMNLP 2023] Once Upon a *Time* in *Graph*: Relative-Time Pretraining for Complex Temporal Reasoning☆18Updated last year
- Code for "Inducer-tuning: Connecting Prefix-tuning and Adapter-tuning" (EMNLP 2022) and "Empowering Parameter-Efficient Transfer Learning…☆11Updated last year
- implementation of paper "Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners"☆20Updated last year
- Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization☆13Updated 3 weeks ago
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆22Updated 2 months ago
- ☆13Updated 10 months ago
- ☆18Updated 5 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs?☆23Updated 5 months ago
- ☆40Updated 11 months ago
- [EMNLP 2023 Findings] Efficiently Enhancing Zero-Shot Performance of Instruction Following Model via Retrieval of Soft Prompt☆20Updated last year
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environments☆39Updated last month
- [EMNLP 2022] Code for our paper “ZeroGen: Efficient Zero-shot Learning via Dataset Generation”.☆16Updated 2 years ago
- Code for LaMPP: Language Models as Probabilistic Priors for Perception and Action☆35Updated last year
- ☆19Updated last year