zoe-yyx / Awesome-AIAgent-ProtocolLinks
☆13Updated last month
Alternatives and similar repositories for Awesome-AIAgent-Protocol
Users that are interested in Awesome-AIAgent-Protocol are comparing it to the libraries listed below
Sorting:
- MARFT stands for Multi-Agent Reinforcement Fine-Tuning. This repository implements an LLM-based multi-agent reinforcement fine-tuning fra…☆35Updated 2 weeks ago
- ☆30Updated 8 months ago
- Baseline for NeurIPS_Auto_Bidding_General_Track☆33Updated 9 months ago
- ☆33Updated 8 months ago
- Natural Language Reinforcement Learning☆89Updated 5 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆45Updated 7 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆79Updated 9 months ago
- ☆30Updated 7 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆58Updated 2 months ago
- ☆114Updated 4 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆110Updated this week
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆177Updated last month
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆102Updated last year
- Repo of "Large Language Model-based Human-Agent Collaboration for Complex Task Solving(EMNLP2024 Findings)"☆32Updated 8 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆149Updated 2 months ago
- [ICLR 2025] Official code of "Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization"☆14Updated last year
- ☆210Updated 2 weeks ago
- The LLMOPT project offers a comprehensive set of resources, including the model, dataset, training framework, and inference code, enablin…☆64Updated last month
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆94Updated 3 months ago
- ☆11Updated last week
- ☆97Updated last year
- ☆27Updated 8 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆35Updated 2 months ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆92Updated last year
- e☆35Updated last month
- ICML 2025 Spotlight☆90Updated this week
- OptiBench and ReSocratic Synthesis Method☆23Updated 2 months ago
- Yelp Simulator for WWW'25 AgentSociety Challenge☆80Updated last month
- Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors☆24Updated 3 weeks ago
- A Survey of Personalization: From RAG to Agent☆43Updated last month