VPO: Leveraging the Number of Votes in Preference Optimization

Jae Hyeon Cho; Minkyung Park; Byung-Jun Lee

Authors

Jae Hyeon Cho Korea university
Minkyung Park Korea university
Byung-Jun Lee Korea university

Abstract

Direct Preference Optimization (DPO) trains a language model using human preference data, bypassing the explicit reward modeling phase of Reinforcement Learning from Human Feedback (RLHF). By iterating over sentence pairs in a preference dataset, DPO enhances generation quality by increasing the likelihood of producing preferred sentences over less favored ones. Preference datasets, typically labeled with votes or scores, provide valuable insights into whether a sentence pair exhibits a clear preference or remains controversial. However, existing methods do not fully utilize this information. In this paper, we propose a technique that leverages user voting data to better align language models with diverse subjective preferences. We employ the Bayesian Minimum Mean Square Error (Bayesian MMSE) estimator to model the probability that one generation is preferred over another. Using this estimated probability as a target, we introduce the Vote-based Preference Optimization (VPO) framework, which incorporates the number of votes on both sides to distinguish between controversial and clearly preferred generation pairs. Furthermore, we demonstrate that previous algorithms, such as DPO and Identity Preference Optimization (IPO), can be extended using the proposed framework, termed VDPO and VIPO. Our experiments demonstrate that these proposed algorithms outperform various existing methods, including their base algorithms. Additionally, our framework can be applied to reward modeling, demonstrating that our approach is compatible with the broader RLHF pipeline.

VPO: Leveraging the Number of Votes in Preference Optimization

Authors

Abstract

Downloads

Published

Issue

Section

Make a Submission

Information

Announcements

Special Issue on the Ethics of NLP and CL in Computational Linguistics

EMNLP 2026 – CL deadlines for Qualifying Papers