Raj-08 / Q-FlowLinks
Complete Reinforcement Learning Toolkit for Large Language Models!
☆19Updated 3 months ago
Alternatives and similar repositories for Q-Flow
Users that are interested in Q-Flow are comparing it to the libraries listed below
Sorting:
- ☆20Updated 7 months ago
- [ICLR 2025] Code for the paper "Implicit Search via Discrete Diffusion: A Study on Chess"☆28Updated 3 months ago
- implementation of dualformer☆17Updated 4 months ago
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆29Updated last year
- ☆16Updated 11 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆59Updated 4 months ago
- Repository for Skill Set Optimization☆13Updated 11 months ago
- Minimal RLHF implementation built on top of minGPT.☆29Updated 11 months ago
- Advantage Leftover Lunch Reinforcement Learning (A-LoL RL): Improving Language Models with Advantage-based Offline Policy Gradients☆26Updated 9 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆50Updated 3 weeks ago
- ☆86Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Official implementation of ICML 2025 paper "Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment" (https:…☆25Updated last month
- ☆31Updated 8 months ago
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆26Updated last year
- Self-Supervised Alignment with Mutual Information☆19Updated last year
- Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆87Updated 3 weeks ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆49Updated 7 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated last week
- On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability☆39Updated 2 months ago
- ☆16Updated 7 months ago
- Natural Language Reinforcement Learning☆89Updated 6 months ago
- ☆14Updated last year
- ☆42Updated 2 months ago
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆37Updated last year
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆27Updated 3 months ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆23Updated 3 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆54Updated last year
- ☆16Updated 4 months ago
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF☆19Updated 8 months ago