waterhorse1 / Natural-language-RL
☆18Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for Natural-language-RL
- A repository for research on medium sized language models.☆74Updated 6 months ago
- Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"☆44Updated 10 months ago
- ☆28Updated 5 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆55Updated 5 months ago
- ☆55Updated last month
- ☆64Updated 7 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆75Updated last month
- ☆15Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- ☆57Updated 2 weeks ago
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆28Updated 8 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆39Updated last month
- [ACL 2024] Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning☆31Updated 3 months ago
- ☆59Updated last month
- Official implementation for "Law of the Weakest Link: Cross capabilities of Large Language Models"☆38Updated last month
- ☆22Updated 2 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 3 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated 9 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆83Updated last week
- Codebase for Instruction Following without Instruction Tuning☆32Updated 2 months ago
- Repository for Skill Set Optimization☆12Updated 3 months ago
- DPO, but faster 🚀☆23Updated 3 weeks ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆30Updated 9 months ago
- ☆46Updated 2 weeks ago
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆50Updated 7 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆104Updated 6 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆63Updated last month
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…☆37Updated 4 months ago
- ☆64Updated last month
- [NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents☆46Updated 2 weeks ago