Implementation of Direct Preference Optimization
☆17Jul 17, 2023Updated 2 years ago
Alternatives and similar repositories for DPO
Users that are interested in DPO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Project Euler GPT Resolver☆10Feb 12, 2024Updated 2 years ago
- ☆28Oct 30, 2025Updated 6 months ago
- Source for paper, "Data organization in spreadsheets"☆22Sep 30, 2021Updated 4 years ago
- PyTorch implementations for Offline Preference-Based RL (PbRL) algorithms☆21Mar 24, 2025Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆13Sep 7, 2024Updated last year
- FID computation in Jax/Flax.☆29Jul 17, 2024Updated last year
- Backup of the sources for my SJPO Teaching Notes☆10Apr 15, 2019Updated 7 years ago
- Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning☆29Feb 21, 2022Updated 4 years ago
- Experiments to train transformer network to master reinforcement learning environments.☆32Mar 14, 2021Updated 5 years ago
- ☆30Mar 1, 2022Updated 4 years ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆15Jun 21, 2024Updated last year
- Code for ICLR 2025 Paper "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"☆23Feb 10, 2025Updated last year
- Database for International Physics Olympiads☆11Sep 22, 2025Updated 8 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Some notes and solutions to "Machine Learning" authored by Zhi-Hua Zhou☆11Jul 20, 2021Updated 4 years ago
- ☆22Feb 4, 2026Updated 3 months ago
- Repo for paper: Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge☆14Feb 20, 2024Updated 2 years ago
- Code for the paper: Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains☆10Nov 12, 2021Updated 4 years ago
- self-adaptive in-context learning☆45May 5, 2023Updated 3 years ago
- (AAAI'2019) The codes, models, logs, and data for an extended paper of the original paper "On Reinforcement Learning for Full-length Game…☆33Oct 5, 2022Updated 3 years ago
- Faster RCNN using TensorFlow☆10Jul 31, 2022Updated 3 years ago
- Barebones Rust EVM Implementation☆12Feb 9, 2022Updated 4 years ago
- ☆10Sep 19, 2023Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- CEVAE(Causal Effect Variational AutoEncoder) written with pytorch and pyro.☆10Feb 15, 2021Updated 5 years ago
- Multilingual acoustic word embedding approaches applied and evaluated on GlobalPhone data.☆11Nov 3, 2020Updated 5 years ago
- A cog model for the all-mpnet-base-v2 sentence-transformers embedding model.☆15Jan 3, 2024Updated 2 years ago
- The official code of our paper “RAG-Critic: Leveraging Automated Critic-Guided Agentic Workflow for Retrieval Augmented Generation”☆32Aug 19, 2025Updated 9 months ago
- This is MPE-pytorch, fix some bugs.☆11Apr 26, 2020Updated 6 years ago
- A set of tools for use with the huff language.☆21Jun 24, 2022Updated 3 years ago
- Imitation learning from multiple experts☆13Aug 29, 2022Updated 3 years ago
- ☆10Nov 27, 2019Updated 6 years ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆68Dec 10, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for Powderworld: A Platform for Understanding Generalization via Rich Task Distributions☆74Aug 31, 2024Updated last year
- Triton implementation of GPT/LLAMA☆21Aug 28, 2024Updated last year
- ☆15Jan 8, 2023Updated 3 years ago
- An implementation of Variational State Tabulation, from the paper here: https://arxiv.org/abs/1802.04325.☆14Feb 25, 2024Updated 2 years ago
- Course Info for VIP-GEAI☆11Apr 11, 2024Updated 2 years ago
- Source code for some notes for the mathematical tripos.☆23Dec 23, 2018Updated 7 years ago
- Code for ICLR 2022 Paper (HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning)☆12Nov 28, 2023Updated 2 years ago