Close printable page

Recommendation

Understanding the key ingredients of the Bayesian Truth Serum

Ljerka Ostojic based on reviews by 2 anonymous reviewers

A recommendation of:

STAGE 1

Taking A Closer Look At The Bayesian Truth Serum: A Registered Report

Philipp Schoenegger & Steven Verheyen https://osf.io/xw6hn version https://osf.io/xw6hn

Read report on server

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Taking A Closer Look At The Bayesian Truth Serum: A Registered Report

Over the past decades, psychology and its cognate disciplines have undergone substantial reform, ranging from advances in statistical methodology to significant changes in academic norms. One aspect of experimental design that has received comparatively little attention is incentivisation, i.e. the way that participants are rewarded and incentivised monetarily for their participation. While incentive compatible designs are in use in disciplines like economics, the majority of studies in psychology and experimental philosophy are constructed such that individuals’ incentives to maximise their payoffs in many cases counteract their incentives to state their true preferences honestly. This is in part because the subject matter is often self-report data about subjective topics. One mechanism that allows for the introduction of an incentive-compatible design in such circumstances is the Bayesian Truth Serum (Prelec, 2004), which rewards participants based on how surprisingly common their answer are. Recently, Schoenegger (2021) applied this mechanism in the context of Likert-scale self-reports, finding that the introduction of this mechanism significantly altered response behaviour. In this registered report, we further investigate this mechanism by (i) replicating the original result and (ii) teasing out whether the effect may be explainable by an increase in expected earnings or the addition of a prediction task. We take this project to help introduce incentivisation mechanisms into fields where they were not widely used before.

Incentivisation, Bayesian Truth Serum, Methods, Open Science

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

إلقاء نظرة فاحصة على مصل الحقيقة البايزي: تقرير مسجل

على مدى العقود الماضية، خضع علم النفس والتخصصات المشابهة لإصلاحات كبيرة، بدءًا من التقدم في المنهجية الإحصائية إلى التغييرات المهمة في المعايير الأكاديمية. أحد جوانب التصميم التجريبي الذي حظي باهتمام قليل نسبيًا هو التحفيز، أي الطريقة التي تتم بها مكافأة المشاركين وتحفيزهم ماليًا مقابل مشاركتهم. وفي حين أن التصاميم المتوافقة مع الحوافز تُستخدم في تخصصات مثل الاقتصاد، فإن غالبية الدراسات في علم النفس والفلسفة التجريبية مبنية على نحو يجعل حوافز الأفراد لتعظيم عوائدهم في كثير من الحالات تتعارض مع حوافزهم للتعبير عن تفضيلاتهم الحقيقية بأمانة. ويرجع ذلك جزئيًا إلى أن الموضوع غالبًا ما يكون عبارة عن بيانات تقرير ذاتي حول مواضيع ذاتية. إحدى الآليات التي تسمح بإدخال تصميم متوافق مع الحوافز في مثل هذه الظروف هي مصل الحقيقة الافتراضية (بريليك، 2004)، الذي يكافئ المشاركين بناءً على مدى شيوع إجاباتهم بشكل مدهش. في الآونة الأخيرة، طبق شونيجر (2021) هذه الآلية في سياق التقارير الذاتية على مقياس ليكرت، ووجد أن إدخال هذه الآلية أدى إلى تغيير سلوك الاستجابة بشكل كبير. في هذا التقرير المسجل، نقوم بإجراء مزيد من التحقيق في هذه الآلية من خلال (1) تكرار النتيجة الأصلية و(2) معرفة ما إذا كان التأثير يمكن تفسيره من خلال زيادة الأرباح المتوقعة أو إضافة مهمة التنبؤ. نحن نأخذ هذا المشروع للمساعدة في إدخال آليات التحفيز في المجالات التي لم يتم استخدامها على نطاق واسع من قبل.

التحفيز، مصل الحقيقة البايزي، الأساليب، العلم المفتوح

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Una mirada más cercana al suero de la verdad bayesiano: un informe registrado

Durante las últimas décadas, la psicología y sus disciplinas afines han experimentado reformas sustanciales, que van desde avances en la metodología estadística hasta cambios significativos en las normas académicas. Un aspecto del diseño experimental que ha recibido comparativamente poca atención es la incentivación, es decir, la forma en que los participantes son recompensados e incentivados monetariamente por su participación. Si bien los diseños compatibles con incentivos se utilizan en disciplinas como la economía, la mayoría de los estudios en psicología y filosofía experimental se construyen de tal manera que los incentivos de los individuos para maximizar sus ganancias en muchos casos contrarrestan sus incentivos para expresar sus verdaderas preferencias con honestidad. Esto se debe en parte a que el tema a menudo son datos autoinformados sobre temas subjetivos. Un mecanismo que permite la introducción de un diseño compatible con incentivos en tales circunstancias es el Suero de la Verdad Bayesiano (Prelec, 2004), que recompensa a los participantes en función de lo sorprendentemente común que sea su respuesta. Recientemente, Schoenegger (2021) aplicó este mecanismo en el contexto de autoinformes en escala Likert y encontró que la introducción de este mecanismo alteraba significativamente el comportamiento de respuesta. En este informe registrado, investigamos más a fondo este mecanismo (i) replicando el resultado original y (ii) descubriendo si el efecto puede explicarse por un aumento en las ganancias esperadas o la adición de una tarea de predicción. Aprovechamos este proyecto para ayudar a introducir mecanismos de incentivos en campos donde antes no se utilizaban ampliamente.

Incentivación, suero de la verdad bayesiano, métodos, ciencia abierta

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Examiner de plus près le sérum de vérité bayésien : un rapport enregistré

Au cours des dernières décennies, la psychologie et ses disciplines apparentées ont subi des réformes substantielles, allant des progrès de la méthodologie statistique à des changements significatifs dans les normes académiques. Un aspect de la conception expérimentale qui a reçu relativement peu d’attention est l’incitation, c’est-à-dire la manière dont les participants sont récompensés et incités financièrement pour leur participation. Bien que des conceptions compatibles avec les incitations soient utilisées dans des disciplines telles que l’économie, la majorité des études en psychologie et en philosophie expérimentale sont construites de telle sorte que les incitations des individus à maximiser leurs gains contrecarrent dans de nombreux cas leurs incitations à exprimer honnêtement leurs véritables préférences. Cela est dû en partie au fait que le sujet est souvent constitué de données auto-déclarées sur des sujets subjectifs. Un mécanisme qui permet l'introduction d'une conception compatible avec les incitations dans de telles circonstances est le sérum de vérité bayésien (Prelec, 2004), qui récompense les participants en fonction de la fréquence étonnamment commune de leurs réponses. Récemment, Schoenegger (2021) a appliqué ce mécanisme dans le contexte d’auto-évaluations à l’échelle de Likert, constatant que l’introduction de ce mécanisme modifiait considérablement le comportement de réponse. Dans ce rapport enregistré, nous étudions plus en détail ce mécanisme en (i) reproduisant le résultat original et (ii) en déterminant si l'effet peut s'expliquer par une augmentation des revenus attendus ou par l'ajout d'une tâche de prédiction. Nous profitons de ce projet pour contribuer à introduire des mécanismes d'incitation dans des domaines où ils n'étaient pas largement utilisés auparavant.

Incitation, Sérum de Vérité Bayésien, Méthodes, Science Ouverte

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

बायेसियन ट्रुथ सीरम पर करीब से नज़र डालना: एक पंजीकृत रिपोर्ट

पिछले दशकों में, मनोविज्ञान और इसके संबंधित विषयों में पर्याप्त सुधार हुआ है, जिसमें सांख्यिकीय पद्धति में प्रगति से लेकर शैक्षणिक मानदंडों में महत्वपूर्ण बदलाव शामिल हैं। प्रयोगात्मक डिज़ाइन का एक पहलू जिस पर तुलनात्मक रूप से कम ध्यान दिया गया है वह है प्रोत्साहन, अर्थात जिस तरह से प्रतिभागियों को उनकी भागीदारी के लिए पुरस्कृत किया जाता है और आर्थिक रूप से प्रोत्साहित किया जाता है। जबकि प्रोत्साहन संगत डिज़ाइन अर्थशास्त्र जैसे विषयों में उपयोग में हैं, मनोविज्ञान और प्रयोगात्मक दर्शन में अधिकांश अध्ययन इस तरह से बनाए गए हैं कि कई मामलों में अपने भुगतान को अधिकतम करने के लिए व्यक्तियों के प्रोत्साहन उनकी वास्तविक प्राथमिकताओं को ईमानदारी से बताने के लिए उनके प्रोत्साहन का प्रतिकार करते हैं। ऐसा आंशिक रूप से इसलिए है क्योंकि विषय वस्तु अक्सर व्यक्तिपरक विषयों के बारे में स्व-रिपोर्ट डेटा होती है। एक तंत्र जो ऐसी परिस्थितियों में प्रोत्साहन-संगत डिज़ाइन की शुरूआत की अनुमति देता है वह बायेसियन ट्रुथ सीरम (प्रीलेक, 2004) है, जो प्रतिभागियों को इस आधार पर पुरस्कृत करता है कि उनके उत्तर कितने आश्चर्यजनक रूप से सामान्य हैं। हाल ही में, शोएनेगर (2021) ने इस तंत्र को लिकर्ट-स्केल स्व-रिपोर्ट के संदर्भ में लागू किया, और पाया कि इस तंत्र की शुरूआत ने प्रतिक्रिया व्यवहार को महत्वपूर्ण रूप से बदल दिया है। इस पंजीकृत रिपोर्ट में, हम (i) मूल परिणाम की नकल करके और (ii) यह पता लगाकर इस तंत्र की जांच करते हैं कि क्या अपेक्षित आय में वृद्धि या पूर्वानुमान कार्य के जुड़ने से प्रभाव को समझाया जा सकता है। हम इस परियोजना को उन क्षेत्रों में प्रोत्साहन तंत्र शुरू करने में मदद करने के लिए लेते हैं जहां पहले उनका व्यापक रूप से उपयोग नहीं किया गया था।

प्रोत्साहन, बायेसियन ट्रुथ सीरम, तरीके, खुला विज्ञान

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

ベイズ真理値血清を詳しく見る: 登録済みレポート

過去数十年にわたり、心理学とその関連分野は、統計的方法論の進歩から学術規範の重大な変化に至るまで、大幅な改革を経てきました。実験計画における比較的注目されていない側面の 1 つは、インセンティブ、つまり参加者が参加に対して金銭的に報酬やインセンティブを得る方法です。インセンティブに適合したデザインは経済学などの分野で使用されていますが、心理学や実験哲学の研究の大部分は、多くの場合、個人が自分の利益を最大化しようとするインセンティブが、自分の本当の好みを正直に述べようとするインセンティブを打ち消すように構築されています。これは、主題が主観的なトピックに関する自己申告データであることが多いためです。このような状況でインセンティブと互換性のある設計の導入を可能にするメカニズムの 1 つは、回答が驚くほど一般的であることに基づいて参加者に報酬を与えるベイジアン真実セラム (Prelec、2004) です。最近、Schoenegger (2021) はこのメカニズムをリッカートスケールの自己報告の文脈に適用し、このメカニズムの導入により反応行動が大きく変化することを発見しました。この登録レポートでは、(i) 元の結果を再現し、(ii) その効果が期待収益の増加または予測タスクの追加によって説明可能かどうかを明らかにすることによって、このメカニズムをさらに調査します。私たちはこのプロジェクトを、これまで広く使用されていなかった分野にインセンティブメカニズムを導入することを目的としています。

3cfd8b0cb39d4738悪い36cea664d66dc ベイズ真理値血清を詳しく見る: 登録済みレポート ce110e3c7f7940d9a2bbca549f67d120 インセンティブ、ベイジアン真理血清、メソッド、オープンサイエンス

インセンティブ、ベイジアン真理血清、メソッド、オープンサイエンス

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Analisando mais de perto o soro da verdade bayesiano: um relatório registrado

Nas últimas décadas, a psicologia e suas disciplinas cognatas passaram por reformas substanciais, que vão desde avanços na metodologia estatística até mudanças significativas nas normas acadêmicas. Um aspecto do desenho experimental que tem recebido comparativamente pouca atenção é o incentivo, ou seja, a forma como os participantes são recompensados e incentivados monetariamente pela sua participação. Embora designs compatíveis com incentivos sejam utilizados em disciplinas como a economia, a maioria dos estudos em psicologia e filosofia experimental são construídos de tal forma que os incentivos dos indivíduos para maximizar os seus retornos, em muitos casos, contrariam os seus incentivos para declarar honestamente as suas verdadeiras preferências. Isso ocorre em parte porque o assunto geralmente são dados de autorrelato sobre tópicos subjetivos. Um mecanismo que permite a introdução de um design compatível com incentivos em tais circunstâncias é o Bayesian Truth Serum (Prelec, 2004), que recompensa os participantes com base no quão surpreendentemente comuns são as suas respostas. Recentemente, Schoenegger (2021) aplicou este mecanismo no contexto de autorrelatos em escala Likert, descobrindo que a introdução deste mecanismo alterou significativamente o comportamento de resposta. Neste relatório registado, investigamos mais aprofundadamente este mecanismo (i) replicando o resultado original e (ii) descobrindo se o efeito pode ser explicável por um aumento nos lucros esperados ou pela adição de uma tarefa de previsão. Aproveitamos este projeto para ajudar a introduzir mecanismos de incentivo em campos onde antes não eram amplamente utilizados.

Incentivo, Soro da Verdade Bayesiana, Métodos, Ciência Aberta

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Пристальный взгляд на байесовскую сыворотку правды: зарегистрированный отчет

За последние десятилетия психология и родственные ей дисциплины претерпели существенные реформы: от достижений в статистической методологии до значительных изменений в академических нормах. Одним из аспектов планирования эксперимента, которому уделялось сравнительно мало внимания, является стимулирование, то есть способ денежного вознаграждения и стимулирования участников за участие. Хотя модели, совместимые со стимулами, используются в таких дисциплинах, как экономика, большинство исследований в области психологии и экспериментальной философии построены таким образом, что стимулы людей к максимизации своих выигрышей во многих случаях противодействуют их стимулам честно заявить о своих истинных предпочтениях. Частично это связано с тем, что предметом исследования часто являются данные самоотчетов по субъективным темам. Одним из механизмов, который позволяет в таких обстоятельствах внедрить дизайн, совместимый со стимулами, является Байесовская сыворотка правды (Prelec, 2004), которая вознаграждает участников в зависимости от того, насколько удивительно распространены их ответы. Недавно Шонеггер (2021) применил этот механизм в контексте самоотчетов по шкале Лайкерта, обнаружив, что введение этого механизма значительно изменило поведение реакции. В этом зарегистрированном отчете мы дополнительно исследуем этот механизм, (i) воспроизводя исходный результат и (ii) выясняя, можно ли объяснить этот эффект увеличением ожидаемого дохода или добавлением задачи прогнозирования. Мы реализуем этот проект, чтобы помочь внедрить механизмы стимулирования в те области, где они раньше широко не использовались.

Стимулирование, сыворотка байесовской истины, методы, открытая наука

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

仔细研究贝叶斯真理血清：注册报告

在过去的几十年里，心理学及其相关学科经历了重大改革，从统计方法的进步到学术规范的重大变化。实验设计中相对较少受到关注的一个方面是激励，即参与者因参与而获得奖励和金钱激励的方式。虽然激励相容设计在经济学等学科中得到应用，但大多数心理学和实验哲学研究的构建都表明，在许多情况下，个人最大化收益的动机抵消了诚实陈述真实偏好的动机。部分原因是主题通常是有关主观主题的自我报告数据。在这种情况下允许引入激励兼容设计的一种机制是贝叶斯真理血清（Prelec，2004），它根据参与者的答案的惊人普遍程度来奖励他们。最近，Schoenegger（2021）在李克特量表自我报告的背景下应用了这种机制，发现这种机制的引入显着改变了反应行为。在这份注册报告中，我们通过（i）复制原始结果和（ii）梳理这种影响是否可以通过预期收益的增加或增加预测任务来解释，从而进一步研究这种机制。我们利用这个项目来帮助将激励机制引入到以前没有广泛使用的领域。

激励、贝叶斯真理血清、方法、开放科学

Submission: posted 06 December 2021
Recommendation: posted 23 April 2022, validated 16 September 2022

Cite this recommendation as:
Ostojic, L. (2022) Understanding the key ingredients of the Bayesian Truth Serum. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=149

Related stage 2 preprints:

Taking A Closer Look At The Bayesian Truth Serum: A Registered Report
Philipp Schoenegger & Steven Verheyen
https://doi.org/10.31219/osf.io/9zvqj

Recommendation

The Bayesian Truth Serum, first introduced by Prelec (2004) rewards participants based on how surprisingly common their own answers are in relation to the actual distribution of answers. As such, it has been suggested as a possible incentive-compatible design for survey studies in different disciplines that rely on participants’ self-reports about their true preferences (Schoenegger, 2021).

In this study, Schoenegger and Verheyen propose to replicate the results reported by Schoenegger (2021) and to directly investigate whether the effect elicited by the manipulations known as the Bayesian Truth Serum is distinct from its separate constituent parts.

The Stage 1 manuscript was evaluated over one round of in-depth review. Based on detailed responses to the reviewers’ comments and edits to the stage 1 report, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).

URL to the preregistered Stage 1 protocol: https://osf.io/dkvms

Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.

List of eligible PCI RR-friendly journals:

References

1. Prelec, D. (2004). A Bayesian Truth Serum for Subjective Data. Science, 306(5695), 462-466. https://doi.org/10.1126/science.1102081

2. Schoenegger, P. (2021). Experimental Philosophy and the Incentivisation Challenge: a Proposed Application of the Bayesian Truth Serum. Review of Philosophy and Psychology https://doi.org/10.1007/s13164-021-00571-4

3. Schoenegger, P., & Verheyen, S. (2022). Taking A Closer Look At The Bayesian Truth Serum: A Registered Report. Stage 1 Registered Report, in principle acceptance of Version 2 by Peer Community in Registered Reports. https://osf.io/dkvms

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Evaluation round #2

DOI or URL of the report: https://osf.io/xw6hn

Version of the report: https://osf.io/xw6hn

Author's Reply, 23 Apr 2022

Download tracked changes file

Dear Prof. Ostojic,

we have addressed all comments raised in your latest review and hope that the article is now suitable for acceptance. However, if there are any further changes that you would like us to make, please just let us know!

Best,

Philipp

https://doi.org/10.24072/pci.rr.100149.ar2

Decision by Ljerka Ostojic, posted 21 Apr 2022

Many thanks for the thorough and thoughtful revision of the stage 1 report entitled 'Taking a closer look at the Bayesian Truth Serum: a Registered Report', as well as for providing very detailed replies to the reviewers' and my own comments.

I am happy to provide in-principle acceptance (IPA) for this stage 1 report. However, before doing so there are some minor comments regarding the text of the stage 1 report that I would like you to consider in a final revision.

Page 1: In the abstract, it says ii) testing whether the effect is best explained by parts of the mechanism like the increase in expected earnings or the addition of a prediction task. Here, I was wondering whether using the wording that you have later in the report would be more appropriate - maybe something like you have on page 7 would work well?

Page 2: Thank you for adding the footnotes, I found the information you added to greatly improve the clarity and strength of the arguments made. In this footnote here (footnote 3), I was wondering whethere there are data on the prevalence of unsuccessful attention checks in studies that are not conducted online as a comparison?

Page 4: In the abstract, you write 'incentive-compatible' using a hyphen, and I think this increases readability, but later on, for example on this page, the same is written as 'incentive compatible'. For readability, it would be great if you could be consistent throughout the text, regardless of the option you prefer.

Page 11: '...ii) analyse any effect of the Bayesian Truth Serum is distinct from an increase in expected earnings that.... ' - should this be analyse whether any effect of the Bayesian Truth Serum is distinct from....?

Page 14: 'As the main analyses will be compromised of seven individual...' - should this be 'As the main analyses will be comprised...'?

Once you have issued a minor revision attending to these points, I will issue IPA without further review.

https://doi.org/10.24072/pci.rr.100149.d2

Evaluation round #1

DOI or URL of the report: https://osf.io/xw6hn

Author's Reply, 23 Feb 2022

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.rr.100149.ar1

Decision by Ljerka Ostojic, posted 17 Jan 2022

Dear Dr. Schoenegger,

The stage 1 report entitled “Taking a closer look at the Bayesian Truth Serum: a Registered Report” has now been assessed by two reviewers.

Both reviewers highlight the merit of this stage 1 report and the proposed study. Both reviewers also raise some important questions and issues that I would like you to address in a revision.

In addition, I had some concerns about the way that you planned to do your analyses (this concerns the specific comparisons between different groups and the selection of items for the analyses) as well as inferences that you wish to draw based on the results, particularly negative results. Zoltan Dienes has provided an additional review of these issues, and I am pasting his comments here - please take care to address these issues in the revised report.

1. Regarding the inferences based on results showing a non-significant difference between groups:

“The problem with the analysis plan is that it takes non-significance in itself as evidence against there being a difference. What is needed is an inferential procedure that justifies a claim of no effect so that results could actually count against various predictions. See https://psyarxiv.com/yc7s5/ for the typical alternatives and how to approach them: power; equivalence tests of various sorts; Bayes factors.

To justify a conclusion of no effect being there, there needs to be a scientifically motivated indication of what size effect there could be, if there were one. For power and equivalence tests this should be a minimally relevant effect. Thus, if the plan is to continue to use chi square then power is calculated with respect to Kramer's V or w. A problem for the author to address is justifying the minimally relevant effect size. Then power can be calculated; and thus, according to Neyman Pearson, a non-significant result taken as grounds for asserting no difference. No special power is required for PCI RR, but the power of each test should be known, if power is the tool used; but the author may wish to bear in mind that different PCI RR friendly variables may have requirements (albeit not RSOS). In any case, whatever the journal requirements, "no effect" cannot be concluded without justification.

One possiiblity, which the author may reject, is to say the BTS is useful in so far as it shifts the mean towards the truthful answer; thus a test of mean differences could be used. Power, equivalence testing and Bayes factors are all easier conceptually. One asks what shift of mean is of minimal interest (for power or equivalence tests); or what shift in mean, or range of shifts is scientifically plausible (for Bayes factors), which may be the easier quesation to answer.“

2. Regarding the planned comparisons between different groups and selection of items for these analyses:

“This [refers to the planned comparisons] is problematic because of a selection effect: By selecting extreme scores in the first comparison [concerns the comparison between the BTS group and the main control group], they will naturally tend to get differences in the second [concerns the comparisons between the BTS and additional control groups]. So I wouild not do the pre-selection. The authors should just look at the overall evidence, for each comparison without preselection.”

Here, I had an additional question: given that with the two new control groups, the idea is to test to what extent alternative explanations can explain the base effect (which statistically manifests itself as a difference in the response distribution between the BTS group and the control group in a specific direction), would it not be useful to compare that difference to the difference between a group in which an alternative manipulation was used and the control group? I think an analysis as the one proposed by Reviewer 2 may be in line with this.

Best wishes,

Ljerka Ostojic

Below I am pasting some minor suggestions concerning the text of the stage 1 report. Most of these are very much suggestions and changes are not necessarily required but may help in ensuring that readers get the most out of the report.

In the Abstract, at the end, it says that under ii) you want to test ‘whether the effect may be explainable by an increase in expected earnings or the addition of a prediction task’. This very much sets the expectation that it is these two explanations that are being primarily considered as possible explanations of the effect. However, implicit in your text is that the primary explanation is that it is the truth incentivising interaction between the instruction and related monetary incentive that elicits the effect, and as such these would be alternative explanations. This is made more explicit in other areas of the report where you explicitly state the term ‘alternative explanations’ and especially ‘the worry that …’. The reader would be greatly aided in understanding the report if this was all standardised and made clearer throughout the text.

The last sentence of the abstract is not very clear – i.e., it is not clear how this relates to what you are testing here.

‘While there have been significant methodological advances in psychology and cognate disciplines recently,…’ This statement should be supported by references, but also maybe explained further, as it is currently not sufficiently clear how it is relevant to the study at hand. In many ways, it is not necessary or useful to state this here at all?

When you state that “many papers do not report the compensation fee that was offered to research participants and the fact that these fees vary widely among the papers that do disclose them (e.g., Keith et al., 2017; Rea et al., 2020)’ more information would be useful: Are there actual numbers indicating prevalence available? Is the compensation here meant for the same task/time invested by participants?

‘Perhaps this is due to the null findings reported by the majority of studies that investigated the influence of financial incentives on data quality (e.g., Buhrmester et al., 2011; Crump et al., 2013; Mason & Watts, 2010; Rouse, 2015). There are, however, noteworthy exceptions indicating that increasing financial compensation can improve data quality (Ho et al., 2015; Litman et al., 2015).’ More information here may be useful to the reader.

In the next part of the introduction (but also in a later part, where you write about incentive compatible and incentive incompatible designs), the reader receives information about participants, especially in online studies, clicking through items in surveys rather engaging with the item content in order to maximise payoff. This is contrasted with the BTS manipulation to incentivise honest answers. What is missing or what I think could be confusing to readers is the step that connects these two issues, because honest answers are not necessarily the only answers that participants may give even when they engage with the items and their content.

‘When participant payments are primarily dependent on completion of the online survey, participants are likely to complete studies as quickly as possible and to complete as many of them as feasible in the time they have available in order to maximise payoffs.' Are there any data supporting this argument? Given the footnote, do we know how many participants fail the attention checks? Or data on differences between online and in-person studies?

‘The Bayesian Truth Serum works by informing participants that the survey they are about to complete makes use of an algorithm for truth-telling that has been developed by researchers at MIT and has been published in the journal Science. This algorithm will be used to assign survey answers an information score, indicating how truthful and informative the answers are. The respondents with the top-ranking information scores will receive a bonus in addition to the base pay for participation. Participants then go on to answer study items as they normally would,’ For readers who are not so familiar with the BTS manipulation, it would be useful to state more clearly whether the part of ‘This algorithm will be used…’ is part of the instructions given to the participants. On a side note, the actual wording in the Schoenegger (2021) differs so it would be good to state clearly what wording you will use in the proposed study, and also alert the reader to any differences from the Schoenegger (2021) study given that it is these results that you are aiming to replicate.

‘However, participants are only told that they can earn a bonus for answering truthfully and are not informed about the specific mechanisms of the compensation scheme.’ This requires more explanation, especially in light what instruction the participants actually receive (see a previous comment).

‘to ensure that the results found there generalise to a new sample and effects of the Bayesian Truth Serum are as such also likely to replicate in other people’s implementations.’ Maybe researchers’ instead of people’s?

Will participants who took part in Schoenegger (2021) be able to take part in this study?

https://doi.org/10.24072/pci.rr.100149.d1

Reviewed by anonymous reviewer 2, 15 Dec 2021

Download the review https://doi.org/10.24072/pci.rr.100149.rev11

Reviewed by anonymous reviewer 1, 24 Dec 2021

Download the review https://doi.org/10.24072/pci.rr.100149.rev12