Raj-08 / Q-Flow
Complete Reinforcement Learning Toolkit for Large Language Models!
☆15Updated last month
Alternatives and similar repositories for Q-Flow:
Users that are interested in Q-Flow are comparing it to the libraries listed below
- ☆16Updated 9 months ago
- ☆20Updated 5 months ago
- Self-Supervised Alignment with Mutual Information☆17Updated 11 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated last week
- [ICLR 2025] Code for the paper "Implicit Search via Discrete Diffusion: A Study on Chess"☆24Updated last month
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆25Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Repository for Skill Set Optimization☆12Updated 9 months ago
- ☆20Updated 10 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆47Updated 4 months ago
- Official Repository of Are Your LLMs Capable of Stable Reasoning?☆25Updated last month
- o1 Chain of Thought Examples☆33Updated 6 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆26Updated last year
- ☆23Updated 10 months ago
- Code and models for EMNLP 2024 paper "WPO: Enhancing RLHF with Weighted Preference Optimization"☆39Updated 7 months ago
- ☆14Updated 11 months ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆26Updated last month
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated 11 months ago
- Source code for GreaTer ICLR 2025 - Gradient Over Reasoning makes Smaller Language Models Strong Prompt Optimizers☆20Updated last week
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆55Updated 10 months ago
- ☆25Updated 6 months ago
- official implementation of paper "Process Reward Model with Q-value Rankings"☆56Updated 2 months ago
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆20Updated 2 months ago
- [EMNLP 2023] Knowledge Rumination for Pre-trained Language Models☆17Updated last year
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆26Updated 4 months ago
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- exploring whether LLMs perform case-based or rule-based reasoning☆28Updated last year
- Complexity Based Prompting for Multi-Step Reasoning☆17Updated 2 years ago
- ☆20Updated 2 months ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆48Updated last year