DOI or URL of the report: https://osf.io/preprints/psyarxiv/tfp2c
Version of the report: Version 10
Two reviewers have evaluated your Stage 2 and are largely happy. They have some minor concerns to address. I just have one further point to make. In the manuscript you point out you have low power to pick up an effect for the crucial interaction of half the size found in the Pilot. As half the size may still be interesting, you point out that a non-significant result means one should suspend judgment. In the Discussion you point this out again. That is good. You also claim in the Abstract and Discussion that the interaction failed to replicate. On the one hand, if replication means "getting a significant result", that is true. But many people implicitly take the claim that an effect failed to replicate to mean there was evidence against the effect. Be clear in the Abstract that by failing to replciate you mean there was a significant interaction in the pilot but but not a signfiicant interaction in the main study, though the power means interesting effect sizes may have been missed.
Dear Dr. Zoltan Dienes,
Thanks for your patience. Enclosed please find my review of the stage 2 report. I hope that you and the authors find it helpful.
Best,
Yikang
Download the reviewOverall, I am satisfied with the authors’ reporting of their Stage 2 results. PCI-RR highlights the following questions as important for reviewers to consider at Stage 2: Have the authors provided a direct URL to the approved protocol in the Stage 2 manuscript? Did they stay true to their protocol? Are any deviations from protocol clearly justified and fully documented?Is the Introduction in the Stage 1 manuscript (including hypotheses) the same as in the Stage 2 manuscript? Are any changes transparently flagged? Did any prespecified data quality checks, positive controls, or tests of intervention fidelity succeed? Are any additional post hoc analyses justified, performed appropriately, and clearly distinguished from the preregistered analyses? Are the conclusions appropriately centered on the outcomes of the preregistered analyses? Are the overall conclusions based on the evidence?
Overall, in my view, these questions are answered affirmatively. I just have a few small questions, largely revolving around a couple of questions from my Stage 1 review that I feel still could be addressed better.
-One other question first though. It seems like the authors did a great job sticking to their preregistered protocol and analysis plan. Just to make it clear for readers, though, could the authors clarify if there were any deviations from their Stage 1 proposed protocol or analyses? Again, it seems that the answer is “no”, which is great, but stating that explicitly could further reassure readers about the transparency of the methods and results.
-For the preliminary study, the authors said in their Stage 1 response that they reported the Nakagawa R^2, but I do not see it present. Could authors make sure it is present, and if it is not, report it? I do see it reported for the replication.
-I feel there is still more that could be said regarding external validity, particularly in the light of the replication results. Here and in the Stage 1 response, the authors did a great job of justifying the decision to use the economic self-interest decision-making task compared to a verbal interview and included a nuanced discussion of the trade-offs between the two methods. My point is different than that, though. Instead, my point (also raised in Stage 1), is that any sort of artificial lab task will have difficulty capturing the threats (costs) caused by a real-life disclosure decision in which one’s life or freedom (or the life or freedom of loved ones) is at risk. The difference in severity of the costs of disclosure in real criminal scenarios compared to the economic costs used in the task could also partially explain why attention to benefits seemed to be the bigger driver of decisions here compared to the proposed model. As the authors mention, this sort of artificiality is common and necessary to examine the issue psychologically, so it is not a huge deal, but I still would appreciate a sentence or two to this point in the “external vs. internal validity” discussion subsection.
I thank the authors for their time and efforts.