Thank you for your careful response to the points of myself and the reviewers. I am now happy to award in principle acceptance (IPA). As requested, your submission is being awarded a private Stage 1 acceptance, which will not appear yet on the PCI RR website. Your Stage 1 manuscript has also been registered under the requested 4-year private embargo on the OSF (link below).
URL to the preregistered Stage 1 protocol: https://osf.io/fc9gp
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
DOI or URL of the report: https://osf.io/pxne3/?view_only=66eab29c7acb4aebbcec4631cbcb9217
Version of the report: v1.2
Thank you for your revision which has addressed a number of issues. The inclusion of the pilot is useful. A key problem with the revision is why a Cohen's d of 0.5 is the smallest meaningful effect size. Why wouldn't a d of 0.4 be practically very important? The choice of this value potentially has major implications for which way statistical conclusions go; thus the choice needs to be motivated by the scientific context in order for the conclusions to be relevant to the scientific context. For some ideas on how to motivate this choice see: https://psyarxiv.com/yc7s5/
Having justified a smallest meaningful effect size, call it S, your decision rule could be simple: e.g. for a case where the DV should be larger than S, is the sample effect significantly larger than S (conclude that there is an effect of interest); significantly smaller than S (conclude that there is not an effect of interest); or neither, in which case suspend judgment. The relevant power would then be the probability of the first outcome given a predicted effect size; and, equally relevant, of the second outcome given 0 effect.
DOI or URL of the report: https://osf.io/38hf9/?view_only=66eab29c7acb4aebbcec4631cbcb9217
Dear Dr Espinosa
I now have three reviews for your paper; all reviewers are overall very positive, both about the motivation for the study, and about the methods broadly. However they raise specific concerns, particularly about showing whether the psychometric properties of your scales are up to the job, and about inferential procedures, specifically concerning the effect size of interest and whether evidence could thus be obtained for no efect.
One question, raised by Aldoh, is whether you would want to pilot your scales to establish their reliability or validity. Or else, as suggested by Palfi, introduce into the main study "outcome neutral tests" - or estimates - of their scale properties, so that main conclusions are conditional on these tests showing adequate psychometric properties. A minor point: For the first scale, how about asking the question in this form "How likely are you to..." with options 0%, 10%...100%. Would this make the meaning of the response options clearer to subjects (and hence easier for us to interpret)?
The other main issue concerns staistical inference. First, a point of clarification. You say:
"We observe that we have a probability of over 80% of detecting an effect if it is greater than or equal to 0.10." What are the units? Likert units or Cohen's d?
In terms of power, you have taken as a starting point your sample size, then asked what effect size that implies for an 80% power. As both Palfi and Aldoh asked, why should we be concerned specifically about an effect of 0.1? This point is important in terms of whether a non-significant result would refute your hypothesis. Would a non-sig result allow the conclusion you state follows from it: "the information campaign fails to improving doctors’ views of plant-based diets."Only if power was calculated with respect to a minimally interesting effect for your research problem. Aldoh also points out that conclusions based only on the logic of power do not take into account the data as they are actually observed; for example, a significant result may be less than 0.1; and a non-significant result may come with a confidence interval that extends beyond 0.1. It is up to you what inferential approach you wish to adopt (i..e you can stick with a NP power approach), but some comment on this would be helpful. In effect Aldoh is raising the possibility of an equivalence region approach. It would still require justifying a minimally interesting effect size. Palfi wonders if Bayes factors may be helpful in this regard. Then one needs to say not what is the minimally interesting effect, but what effect is predicted by a theory. (The theory could be e.g. that any difference is possible within the range of the scale, though smaller effects are more likely than bigger ones.) (Some ideas here may help for any of these approaches: https://psyarxiv.com/yc7s5/ )
If justifying a minimally interesting effect or a predicted effect seems difficult, that may be because this is a situation where one should just estimate the effect size with its 95% CI. Approached in this way, your conclusion would not be that the intervention does or does not work; but that the estimate of how well it works is such and such.
The exact resolution of this inferential issue could go in several directions. Both myself and the reviewers thought this needed more work however.
Tables S1 and S2:Column labeled "Beta" - you mean "power"?
I look forward to receiving a revised manuscript that addresses these issues, and the other that the reviewers raised.