ruizheng20 / gpoView external linksLinks
The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".
☆17Jun 20, 2024Updated last year
Alternatives and similar repositories for gpo
Users that are interested in gpo are comparing it to the libraries listed below
Sorting:
- Implicit Differentiable Optimal Control (IDOC) with JAX☆12May 11, 2022Updated 3 years ago
- ☆13Feb 11, 2021Updated 5 years ago
- [NeurIPS 2025] Reasoning Models Better Express Their Confidence"☆22Nov 19, 2025Updated 2 months ago
- ☆18Feb 7, 2021Updated 5 years ago
- Official PyTorch Implementation for Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning☆19Jan 11, 2023Updated 3 years ago
- ☆23Oct 21, 2024Updated last year
- ☆20Oct 15, 2022Updated 3 years ago
- OpenAI Gym environment for DART robotics simulator.☆22Apr 17, 2018Updated 7 years ago
- Plannable Approximations to MDP Homomorphisms: Equivariance under Actions☆30Jun 30, 2020Updated 5 years ago
- [ICLR 22] Value Gradient weighted Model-Based Reinforcement Learning.☆25Apr 15, 2023Updated 2 years ago
- Tools for visualizing and comparing data from vertebrate retinas☆14Jan 20, 2025Updated last year
- Code for minimum-entropy coupling.☆32Jan 6, 2026Updated last month
- TD-Regularized Actor-Critic Methods☆36Dec 26, 2019Updated 6 years ago
- Repo for the paper "Landscape Surrogate Learning Decision Losses for Mathematical Optimization Under Partial Information"☆38Jul 20, 2023Updated 2 years ago
- Code release for "Stochastic Optimal Control Matching"☆39Aug 14, 2024Updated last year
- JAX implementation of VQVAE/VQGAN autoencoders (+FSQ)☆41Jun 6, 2024Updated last year
- Understanding Short-Horizon Bias in Stochastic Meta-Optimization☆37Mar 8, 2018Updated 7 years ago
- Tree-structured recurrent switching linear dynamical systems☆38Jul 13, 2020Updated 5 years ago
- Code repository for scenarios and environment setup as part of ITBench☆15Feb 10, 2026Updated last week
- 👀 VITRina: VIsual Token Representations☆11Jun 15, 2023Updated 2 years ago
- [NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms"☆23Oct 23, 2025Updated 3 months ago
- ☆11Jun 15, 2019Updated 6 years ago
- ☆10Feb 17, 2019Updated 7 years ago
- Kernel Playground - A playground to run large scale experiments on the Linux Kernel☆17Nov 8, 2025Updated 3 months ago
- Implementation for ICML 2019 paper, EMI: Exploration with Mutual Information.☆36Dec 7, 2020Updated 5 years ago
- How to train a neural ODE for time series/weather forecasting☆39Feb 6, 2023Updated 3 years ago
- Decentralized Reinforcment Learning: Global Decision-Making via Local Economic Transactions (ICML 2020)☆43Dec 8, 2022Updated 3 years ago
- yet another reinforcement learning package☆12May 24, 2022Updated 3 years ago
- The package is developed for treatment recommendation & pairwise treatment individual effect estimation (ITE/CATE/HTE) when multiple trea…☆11Mar 9, 2023Updated 2 years ago
- MATLAB implementation of the universal directed information estimators in Jiantao Jiao, Haim H. Permuter, Lei Zhao, Young-Han Kim, and Ts…☆11Apr 2, 2019Updated 6 years ago
- Official implementation of Tabular Transfer Learning via Prompting LLMs (COLM 2024).☆12Aug 6, 2024Updated last year
- Code and Dataset release of "Carpe Diem: On the Evaluation of World Knowledge in Lifelong Language Models" (NAACL 2024)☆10Oct 16, 2024Updated last year
- Layered distributions using FLAX/JAX☆10Dec 13, 2020Updated 5 years ago
- Code for "Using Embeddings to Correct for Unobserved Confounding"☆10May 31, 2019Updated 6 years ago
- A Random Matrix Approach to Extreme Learning Machine☆15Feb 23, 2018Updated 7 years ago
- Cone program refinement☆10Mar 6, 2020Updated 5 years ago
- Gym wrapper for Vizdoom environments☆12Dec 14, 2018Updated 7 years ago
- Mis proyectos de marketing aplicando AI☆11Oct 31, 2025Updated 3 months ago
- Attentional Mechanism incorporated in Asynchronous Advantage Actor Critic a3c/a2c deep mind☆10Jan 9, 2018Updated 8 years ago