jinpz/q_sharp

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jinpz/q_sharp)

jinpz / q_sharp

The official code release for Q#: Provably Optimal Distributional RL for LLM Post-Training

☆20

Alternatives and similar repositories for q_sharp

Users that are interested in q_sharp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kilian-group / phantom-wiki
View on GitHub
Python package for generating datasets to evaluate reasoning and retrieval of large language models
☆25Jun 3, 2026Updated last month
mansicer / Q-Adapter
View on GitHub
Implementation of ICLR 2025 paper "Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation"
☆18Oct 5, 2024Updated last year
cxy1997 / graphite-utils
View on GitHub
☆27Apr 9, 2024Updated 2 years ago
microsoft / Intrepid
View on GitHub
INTeractive learning via REPresentatIon Discovery
☆36Jun 2, 2024Updated 2 years ago
JesseFarebro / distributional-sr
View on GitHub
Official implementation of the δ-model presented in the ICML 2024 paper "A Distributional Analogue to the Successor Representation".
☆23Nov 8, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
kaiwenw / JoinGym
View on GitHub
A lightweight RL environment for query optimization.
☆16Sep 13, 2024Updated last year
stacyste / TheoryOfMindInferenceModels
View on GitHub
☆28Nov 22, 2019Updated 6 years ago
morning9393 / ETPO
View on GitHub
☆14Mar 5, 2024Updated 2 years ago
RUCKBReasoning / CodeRM
View on GitHub
Official code implementation for the ACL 2025 paper: 'Dynamic Scaling of Unit Tests for Code Reward Modeling'
☆27May 16, 2025Updated last year
luisfelipewb / RL4WasteCapture
View on GitHub
A Deep Reinforcement Learning Strategy and Framework for Floating Waste Capture
☆13Mar 13, 2025Updated last year
likenneth / q_probe
View on GitHub
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆40Jun 10, 2024Updated 2 years ago
EnricoCancelli / ProximitySocialNav
View on GitHub
repository for "Exploiting Proximity-Aware Tasks for Embodied Social Navigation" paper code
☆12Nov 16, 2023Updated 2 years ago
gcucurull / jax-gat
View on GitHub
JAX implementation of Graph Attention Networks
☆13Jan 29, 2022Updated 4 years ago
rosieyzh / openrlhf-pretrain
View on GitHub
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
☆29Oct 14, 2025Updated 9 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ZhaolinGao / A-PO
View on GitHub
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
☆41May 30, 2025Updated last year
VainF / Remix-DiT
View on GitHub
☆18Dec 11, 2024Updated last year
tianxusky / Code-for-Error-Bounds-of-Imitating-Policies-and-Environments
View on GitHub
☆10Oct 15, 2020Updated 5 years ago
ltzheng / CurriculumMARL
View on GitHub
Code of "Towards Skilled Population Curriculum for MARL" + Implementation of Curriculum MARL algorithms based on Ray
☆13Feb 20, 2023Updated 3 years ago
nng555 / cluster_examples
View on GitHub
☆19Apr 2, 2020Updated 6 years ago
JesseFarebro / xtils
View on GitHub
A collection of utilities for machine learning experiments.
☆11Jan 8, 2026Updated 6 months ago
stalhabukhari / comp-sdf-dyn-nav
View on GitHub
Code for ICRA'25 paper: "Differentiable Composite Neural Signed Distance Fields for Robot Navigation in Dynamic Indoor Environments"
☆16Apr 2, 2025Updated last year
roger-creus / ale-nl
View on GitHub
A framework for evaluating LLMs in Atari games
☆15Apr 21, 2025Updated last year
Trae1ounG / Pretrain_Space_RLVR
View on GitHub
[arxiv: 2604.14142] From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
☆17Apr 16, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ZhanboShiAI / ENMuS
View on GitHub
[AAAI 2025] Towards Audio-visual Navigation in Noisy Environments: A Large-scale Benchmark Dataset and An Architecture Considering Multip…
☆15May 21, 2026Updated 2 months ago
leonmakise / PFGEPFR
View on GitHub
Official implementation of CVPR2021 'Pseudo Facial Generation with Extreme Poses for Face Recognition'
☆14May 31, 2022Updated 4 years ago
wangjw55 / DILLM
View on GitHub
Code and Data for Paper: Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation With Open-Sourced LLM
☆18Feb 7, 2025Updated last year
hcmlab / GANterfactual-RL
View on GitHub
Counterfactual explanations for Reinforcement Learning agents on Atari
☆12Apr 3, 2023Updated 3 years ago
Shentao-YANG / Preference_Grounded_Guidance
View on GitHub
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆17Jan 8, 2025Updated last year
google / putting-dune
View on GitHub
☆10Feb 20, 2024Updated 2 years ago
bigai-nlco / RuleReasoner
View on GitHub
[ICLR 2026] RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling
☆39Feb 25, 2026Updated 4 months ago
weiminye / Hands-On-Artificial-Intelligence-for-Banking-Chinese
View on GitHub
《金融中的人工智能》配套代码
☆11Sep 20, 2022Updated 3 years ago
Yui010206 / Adaptive-Visual-Imagination-Control
View on GitHub
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
☆18Jun 2, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zecevic-matej / ESSAI-2023-Causality
View on GitHub
European Summer School on AI Course "Machines Climbing Pearl's Ladder of Causation"
☆13Feb 20, 2024Updated 2 years ago
rovle / gpt3-in-context-fitting
View on GitHub
Experiments on GPT-3's ability to fit numerical models in-context.
☆14Aug 11, 2022Updated 3 years ago
kchua / mbrl-jax
View on GitHub
MBRL library in JAX
☆10Sep 22, 2022Updated 3 years ago
mschweizer / Pref-RL
View on GitHub
Pref-RL provides ready-to-use PbRL agents that are easily extensible.
☆11Aug 31, 2022Updated 3 years ago
notmahi / disk
View on GitHub
PyTorch implementation for "Discovery of Incremental Skills" (DISk) algorithm from ICLR 2022 paper "One After Another: Learning Increment…
☆21Mar 22, 2022Updated 4 years ago
VArdulov / ToMNet
View on GitHub
Reimplementation of ToMNet with some extensions for RL as well
☆14Apr 28, 2018Updated 8 years ago
microsoft / lightATAC
View on GitHub
A lightweight reimplementation of Adversarially Trained Actor Critic
☆19Mar 19, 2026Updated 4 months ago