Direct preference optimization with f-divergences.
☆16Nov 3, 2024Updated last year
Alternatives and similar repositories for f-divergence-dpo
Users that are interested in f-divergence-dpo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Sampling-Based Minimum Bayes-Risk Decoding for Neural Machine Translation☆16Oct 14, 2022Updated 3 years ago
- ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation☆11Dec 23, 2023Updated 2 years ago
- This repository contains code for the paper "Better Estimation of the KL Divergence Between Language Models"☆19May 30, 2025Updated 10 months ago
- ☆17Aug 30, 2025Updated 7 months ago
- for DTCA model☆10Oct 17, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment☆56Jun 16, 2024Updated last year
- Robust and safe deep reinforcement learning algorithms☆17Mar 27, 2024Updated 2 years ago
- ☆12Feb 22, 2021Updated 5 years ago
- NeurIPS 2025: Discriminative Constrained Optimization for Reinforcing Large Reasoning Models☆53Mar 14, 2026Updated last month
- [ICLR 2025] This repository contains the code to reproduce the results from our paper From Sparse Dependence to Sparse Attention: Unveili…☆12Mar 7, 2025Updated last year
- Code of the paper: Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function☆13Updated this week
- Source code for "When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian …☆10May 30, 2021Updated 4 years ago
- Trust Region Preference Approximation: A simple and stable reinforcement learning algorithm for LLM reasoning☆15Jun 28, 2025Updated 9 months ago
- 2021 “AI Earth”人工智能创新挑战赛 AI助力精准气象和海洋预测☆13Apr 7, 2021Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Evaluating and improving the faithfulness of the interpretations offered by Neural Module Networks☆13Jun 12, 2023Updated 2 years ago
- ☆16Jul 29, 2025Updated 8 months ago
- Contrastive self-supervised learning using Rényi divergence☆14Oct 21, 2022Updated 3 years ago
- Official code for ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning (AAAI'24)☆17Feb 10, 2024Updated 2 years ago
- ☆29Oct 8, 2025Updated 6 months ago
- ☆14Mar 5, 2024Updated 2 years ago
- RL algorithm: Advantage induced policy alignment☆66Aug 11, 2023Updated 2 years ago
- Code for ICML21 paper "Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation"☆12Feb 8, 2023Updated 3 years ago
- 存储在学习人工智能(AI)中涉及到的各种基础知识,工具,模型,算法,代码等。☆14Mar 10, 2019Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- VQ-VAE implementation in pytorch, supporting EMA and Gumbel trainings. Applicable for images and time series.☆11Oct 19, 2022Updated 3 years ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated 2 years ago
- ScreenExplorer: Training a Vision-Language Model for Diverse Exploration in Open GUI World☆25Jun 17, 2025Updated 10 months ago
- [ICML 2025] Official code of "DAMA: Data- and Model-aware Alignment of Multi-modal LLMs"☆16May 24, 2025Updated 10 months ago
- Accelerating RL for LLM Reasoning with Optimal Advantage Regression☆40May 30, 2025Updated 10 months ago
- Video Summarization Transformer: Implementation in PyTorch of the Transformer model for video summarisation☆10Oct 27, 2020Updated 5 years ago
- A python tool that generate latex(e.g. Table, matrix) code.☆10Jun 22, 2022Updated 3 years ago
- ☆14Oct 7, 2024Updated last year
- A transformer model that should be able to solve a simple NER task☆11Mar 7, 2019Updated 7 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆19Jun 3, 2024Updated last year
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆951Feb 16, 2025Updated last year
- ☆16Jun 14, 2023Updated 2 years ago
- ☆16May 22, 2025Updated 10 months ago
- A reinforcement learning agent playing as the turret, where its goal is to allow ten friendly units to enter the base, and loses if an en…☆14Dec 24, 2020Updated 5 years ago
- 本项目展示了2022年部分信息检索/数据挖掘顶会论文分类。☆17Jun 13, 2022Updated 3 years ago
- ☆28Aug 18, 2023Updated 2 years ago