Implementation of Direct Preference Optimization
☆17Jul 17, 2023Updated 2 years ago
Alternatives and similar repositories for DPO
Users that are interested in DPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning (AAAI'24)☆17Feb 10, 2024Updated 2 years ago
- Latent Large Language Models☆19Aug 24, 2024Updated last year
- [NeurIPS'20] Code for the paper "Offline Imitation Learning with a Misspecified Simulator"☆12Nov 24, 2021Updated 4 years ago
- Project Euler GPT Resolver☆10Feb 12, 2024Updated 2 years ago
- Official implementation of Harnessing Mixed Offline Reinforcement Learning Datasets via Trajectory Reweighting☆16Feb 14, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆27Oct 30, 2025Updated 6 months ago
- PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms☆21Mar 24, 2025Updated last year
- Standalone library of frequently-used wrappers for dm_env environments.☆19Jul 9, 2024Updated last year
- GPT* - Training faster small transformers using ALiBi, Parallel Residual Connections and more!☆20Oct 29, 2022Updated 3 years ago
- Code for MOBILE: Model-Bellman Inconsistency Penalized Offline Policy Optimization☆23Apr 17, 2024Updated 2 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated 2 years ago
- FID computation in Jax/Flax.☆28Jul 17, 2024Updated last year
- Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning☆29Feb 21, 2022Updated 4 years ago
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆32Jul 28, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆15Jun 21, 2024Updated last year
- Frechet inception distance (FID) evaluation in JAX☆14May 28, 2024Updated last year
- EleutherAI ML Performance reading group repository (slides, meeting recordings, annotated papers)☆31Mar 20, 2026Updated last month
- RLA is a tool for managing your RL experiments automatically☆71Feb 7, 2023Updated 3 years ago
- Repository example to run Blender in a Docker container.☆14Aug 9, 2020Updated 5 years ago
- Code for ICLR 2025 Paper "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"☆22Feb 10, 2025Updated last year
- ☆22Feb 4, 2026Updated 3 months ago
- Code for CascadeBERT, Findings of EMNLP 2021☆12Mar 30, 2022Updated 4 years ago
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆14Feb 20, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆10Jan 20, 2023Updated 3 years ago
- Code for the paper: Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains☆10Nov 12, 2021Updated 4 years ago
- self-adaptive in-context learning☆45May 5, 2023Updated 3 years ago
- (AAAI'2019) The codes, models, logs, and data for an extended paper of the original paper "On Reinforcement Learning for Full-length Game…☆33Oct 5, 2022Updated 3 years ago
- Faster RCNN using TensorFlow☆10Jul 31, 2022Updated 3 years ago
- Distributed RL Implementation using Pytorch and Ray (ApeX(Ape-X), A3C, Distributed-PPO(DPPO), Impala)☆27Jun 8, 2022Updated 3 years ago
- ☆13Feb 1, 2024Updated 2 years ago
- Unsupervised Domain Adaptive Salient Object Detection Through Uncertainty-Aware Pseudo-Label Learning, AAAI Conference on Artificial Inte…☆30May 25, 2023Updated 2 years ago
- Barebones Rust EVM Implementation☆12Feb 9, 2022Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆10Sep 19, 2023Updated 2 years ago
- CEVAE(Causal Effect Variational AutoEncoder) written with pytorch and pyro.☆10Feb 15, 2021Updated 5 years ago
- Multilingual acoustic word embedding approaches applied and evaluated on GlobalPhone data.☆11Nov 3, 2020Updated 5 years ago
- R package for 'Bayesian multivariate analysis of summary statistics' (Stephens Lab)☆10Sep 17, 2020Updated 5 years ago
- The official code of our paper “RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented Generation”☆29Aug 19, 2025Updated 8 months ago
- This is MPE-pytorch, fix some bugs.☆11Apr 26, 2020Updated 6 years ago
- A set of tools for use with the huff language.☆21Jun 24, 2022Updated 3 years ago