Single- vs. dual-prompt dialogue generation with LLMs for job interviews in human resources

Publication type: P1
Publication status: Published
Authors: De Baer, J., Doğruöz, A.S., Demeester, T., & Develder, C.
Series: Proceedings of the Fourth Workshop on Generation, Evaluation and Metrics (GEM²)
Pagination: 947-957
Publisher: Association for Computational Linguistics (ACL)
Conference: 4th Workshop on Natural Language Generation Evaluation and Metrics-GEM-Annual (Vienna, Austria + Online)
Download
View in Biblio

Abstract

Optimizing language models for use in conversational agents requires large quantities of example dialogues. Increasingly, these dialogues
are synthetically generated by using powerful
large language models (LLMs), especially in
domains where obtaining authentic human data
is challenging. One such domain is human resources (HR). In this context, we compare two
LLM-based dialogue generation methods for
producing HR job interviews, and assess which
method generates higher-quality dialogues, i.e.,
those more difficult to distinguish from genuine human discourse. The first method uses
a single prompt to generate the complete interview dialog. The second method uses two
agents that converse with each other. To evaluate dialogue quality under each method, we
ask a judge LLM to determine whether AI
was used for interview generation, using pairwise interview comparisons. We empirically
find that, at the expense of a sixfold increase
in token count, interviews generated with the
dual-prompt method achieve a win rate 2 to
10 times higher than those generated with the
single-prompt method. This difference remains
consistent regardless of whether GPT-4o or
Llama 3.3 70B is used for either interview generation or quality judging.

Dec. 17, 2025	On how GPT-4o, Gemini-2.5 and DeepSeek-R1 have been used in lexicography
Oct. 31, 2025	PhD Defense Sofie
Oct. 6, 2025	PhD Defense Aaron
Oct. 2, 2025	Tekom Belgium at the LT3 offices
Sept. 29, 2025	Francesca at ICLC 11