sahsaeedi / triple-preference-optimization

☆18

Related projects ⓘ

Alternatives and complementary repositories for triple-preference-optimization

ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆55Updated 3 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆71Updated last month
yihedeng9 / STIC
Enhancing Large Vision Language Models with Self-Training on Image Comprehension.
☆59Updated 5 months ago
haozheji / exact-optimization
ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
☆46Updated 5 months ago
sail-sg / CPO
[NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.
☆63Updated last month
UNITES-Lab / MC-SMoE
[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆64Updated 5 months ago
hbin0701 / Self-Explore
[EMNLP Findings 2024 & ACL 2024 NLRSE Oral] Enhancing Mathematical Reasoning in Language Models with Fine-grained Rewards
☆44Updated 6 months ago
haonan3 / AnchorContext
AnchorAttention: Improved attention for LLMs long-context training
☆142Updated this week
google / spiqa
Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers"
☆40Updated last month
GATECH-EIC / ACT
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆24Updated 4 months ago
HKUNLP / diffusion-of-thoughts
[NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"
☆84Updated 8 months ago
Edward-Sun / easy-to-hard
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision
☆97Updated 2 months ago
shenao-zhang / SELM
The official implementation of Self-Exploring Language Models (SELM)
☆55Updated 5 months ago
JieyuZ2 / TaskMeAnything
[NeurIPS 2024] A task generation and model evaluation system for multimodal language models.
☆56Updated last month
Wang-ML-Lab / multimodal-needle-in-a-haystack
Code and data for the benchmark "Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Lan…
☆34Updated 4 months ago
GAIR-NLP / weak-to-strong-reasoning
☆54Updated 2 months ago
HKUNLP / DiffuLLaMA
DiffuGPT and DiffuLLaMA: Scaling Diffusion Language Models via Adaptation from Autoregressive Models
☆58Updated 3 weeks ago
WooooDyy / LLM-Reverse-Curriculum-RL
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆74Updated 9 months ago
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆47Updated last month
zitian-gao / SC-MCTS
Interpretable Contrastive Monte Carlo Tree Search Reasoning
☆24Updated 2 weeks ago
abacusai / smaug
☆61Updated 9 months ago
rookie-joe / AutoPSV
☆31Updated 3 weeks ago
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆51Updated 2 months ago
r-three / smear
☆27Updated last year
mukhal / grace
[EMNLP 2023, Findings] GRACE: Discriminator-Guided Chain-of-Thought Reasoning
☆44Updated last month
Yu-Fangxu / FoR
Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples
☆39Updated last month
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆39Updated 3 months ago
FreedomIntelligence / OVM
☆51Updated 7 months ago
luka-group / vlm-knowledge-conflict
Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."
☆34Updated last month
architsharma97 / dpo-rlaif
☆90Updated 4 months ago