jwliao-ai / MARFT
☆16Updated this week
Alternatives and similar repositories for MARFT:
Users that are interested in MARFT are comparing it to the libraries listed below
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆26Updated last month
- A testbed for agents and environments that can automatically improve models through data generation.☆23Updated last month
- Natural Language Reinforcement Learning☆87Updated 4 months ago
- SCoRe: Training Language Models to Self-Correct via Reinforcement Learning☆9Updated 3 months ago
- ☆15Updated 5 months ago
- ☆55Updated last month
- ☆25Updated 6 months ago
- this is for fun, ain't it grand!☆16Updated last year
- ☆15Updated this week
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆28Updated last year
- implementation of dualformer☆15Updated last month
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 7 months ago
- Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆15Updated last month
- ☆20Updated 2 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆17Updated last month
- ☆16Updated 9 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆11Updated last week
- Official Code Repository for EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents (COLM 2024)☆29Updated 9 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated 10 months ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Updated 8 months ago
- ☆18Updated 5 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆29Updated last month
- ☆36Updated last month
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- This repository is the official implementation of the TRAC optimizer in Fast TRAC: A Parameter-Free Optimizer for Lifelong Reinforcement …☆25Updated 6 months ago
- Self-Supervised Alignment with Mutual Information☆17Updated 11 months ago
- Implementation of ICML 2023 paper: Future-conditioned Unsupervised Pretraining for Decision Transformer☆27Updated last year
- ☆30Updated 5 months ago
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆10Updated 3 weeks ago
- ☆14Updated 3 months ago