Implementation of Direct Preference Optimization
☆17Jul 17, 2023Updated 2 years ago
Alternatives and similar repositories for DPO
Users that are interested in DPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official code for ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning (AAAI'24)☆17Feb 10, 2024Updated 2 years ago
- Latent Large Language Models☆19Aug 24, 2024Updated last year
- ☆27Oct 30, 2025Updated 4 months ago
- Source for paper, "Data organization in spreadsheets"☆22Sep 30, 2021Updated 4 years ago
- PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms☆21Mar 24, 2025Updated last year
- Standalone library of frequently-used wrappers for dm_env environments.☆19Jul 9, 2024Updated last year
- ☆14Jun 24, 2024Updated last year
- Starter template for your ML/AI projects (uv package manager, RestAPI with FastAPI and Dockerfile support)☆33Jan 13, 2025Updated last year
- Code for MOBILE: Model-Bellman Inconsistency Penalized Offline Policy Optimization☆23Apr 17, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated 2 years ago
- ☆11Sep 7, 2024Updated last year
- FID computation in Jax/Flax.☆29Jul 17, 2024Updated last year
- Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning☆29Feb 21, 2022Updated 4 years ago
- Experiments to train transformer network to master reinforcement learning environments.☆32Mar 14, 2021Updated 5 years ago
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆32Jul 28, 2023Updated 2 years ago
- Mamba support for transformer lens☆19Sep 17, 2024Updated last year
- Code for ICLR 2025 Paper "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"☆20Feb 10, 2025Updated last year
- ☆30Mar 1, 2022Updated 4 years ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆14Jun 21, 2024Updated last year
- Frechet inception distance (FID) evaluation in JAX☆14May 28, 2024Updated last year
- EleutherAI ML Performance reading group repository (slides, meeting recordings, annotated papers)☆31Updated this week
- RLA is a tool for managing your RL experiments automatically☆72Feb 7, 2023Updated 3 years ago
- ☆22Feb 4, 2026Updated last month
- Some notes and solutions to "Machine Learning" authored by Zhi-Hua Zhou☆11Jul 20, 2021Updated 4 years ago
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆14Feb 20, 2024Updated 2 years ago
- Code for CascadeBERT, Findings of EMNLP 2021☆12Mar 30, 2022Updated 3 years ago
- In this course navigates through the LLMOps pipeline, enabling you to preprocess training data for supervised fine-tuning and deploy cust…☆14Feb 13, 2024Updated 2 years ago
- Code for the paper: Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains☆10Nov 12, 2021Updated 4 years ago
- (AAAI'2019) The codes, models, logs, and data for an extended paper of the original paper "On Reinforcement Learning for Full-length Game…☆33Oct 5, 2022Updated 3 years ago
- Distributed RL Implementation using Pytorch and Ray (ApeX(Ape-X), A3C, Distributed-PPO(DPPO), Impala)☆27Jun 8, 2022Updated 3 years ago
- Faster RCNN using TensorFlow☆10Jul 31, 2022Updated 3 years ago
- Neural Networks exam project. Machine learning algorithm: implementation of FGSM and JSMA attacks by Goodfellow and Papernot.☆16Jan 13, 2026Updated 2 months ago
- The official code of our paper “RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented Generation”☆27Aug 19, 2025Updated 7 months ago
- 🛠Robust SSH: auto-reconnect SSH session that preserves your running shell and command. Intuitive, no server-side setup, aimed at simplic…☆13Nov 14, 2025Updated 4 months ago
- ☆13Feb 1, 2024Updated 2 years ago
- Barebones Rust EVM Implementation☆12Feb 9, 2022Updated 4 years ago
- ☆10Sep 19, 2023Updated 2 years ago
- The evaluation code for A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5☆53Jan 18, 2026Updated 2 months ago
- Multilingual acoustic word embedding approaches applied and evaluated on GlobalPhone data.☆11Nov 3, 2020Updated 5 years ago