DOI or URL of the report: https://osf.io/ea8uq?view_only=eb70b9bd53fd4b508fce42be1a33012c
Version of the report: 1
Dear authors,
Thank you very much for submitting your manuscript to PCI-RR. I have now heard from two referees and I have also read your paper in detail. I believe your paper to be a very nice project and think you devoted a lot of effort in drafting your manuscript. Thank you for this.
I propose a round a revision to take into account the referees' comments (and possibly mine) to improve your manuscript.
Please let me know should anything be unclear.
Best regards,
Romain
-------
• The Participants section mentions several elements that are mentioned in the design section. I would advice to put the design section before the participants section. For instance, you talk about Experimental Monetary Units (EMU) that you haven't introduced yet.
• Please define a procedure for potential floor or ceiling effects. I think that the very purpose of RR is to specify in advance what you plan to do. So, we should avoid as much as possible to leave it to ex-post decisions. (Page 18)
• I do not understand why you do not want to correct for multiple comparisons for your registered analyses. By definition, correcting for multiple hypothesis testing requires to know how many tests you will run. It is only possible with registered hypotheses (as we do not know what exploratory analyses you will do and how much of it you will report). You say in the design table that you do not need to correct because "no related analyses are conducted using this variable". Well, you look at the same dependent variable in three hypotheses (H1, H2, H3), with the same dataset. In my view, these hypotheses are part of the same family of hypotheses (the OXT has an effect), which would justify correction.
• As a side note: I think that you can use a linear model to analyze the investment decision (eventually a Tobit model in case you fear ceiling or floor effect) because the investment decision is a number of tokens. I do not see here why you would need a non-linear model.
• Page 21: You say "In case the effect of OXT on investments falls within the range ..." You talk about the confidence interval in the design table, which is clearer than this section. I would suggest to complement it.
• Why setting alpha=0.02 in the power analysis?
• Power analysis is made on H1 only. The main issue is that heterogeneity analysis (which is the case here for H2 and H3) requires more observations to maintain a sufficiently high statistical power. It would be nice to compute the expected statistical power of hypotheses 2 and 3.
• Design Table: I fear that the interpretation is a bit too strong for the first line. If you fail at rejecting H1, does it really mean that it contradicts the theory or does it mean instead that you fail at finding supporting evidence for the theory?
• Last, an important issue for me is the clustering of standard errors. Indeed: participants in your experiment spend some time discussing in small groups. This discussion session inevitably generates dependence between observations within the same session. In this case, we would usually cluster the standard errors at the session level. Note that clustering will drastically reduce your degree of freedom, but it seems the correct way to proceed.
• I agree with the anonymous referee who suggests to put the code of the power analysis online or in the manuscript.
This is an important replication study that will follow on from the exploratory findings from Declerck et al (2021). This proposed study is a nice demonstration of how exploratory analyses (i.e., findings from Declerck et al) can lead to future hypothesis driven work. Pooling the new data collected with the previously reported data from Declerck et al is also an nice idea.
The term “minimal effects analysis” in the abstract is unclear to me. Do you mean equivalence testing?
“The hormone oxytocin (OXT) is a nine amino acid neuropeptide that is synthesized in the hypothalamus…” It is mainly synthesized in the hypothalamus, it is also synthesized at other sites in the body, but in much smaller amounts.
It should be briefly noted why small sample sizes bias true effect sizes. It may also help to describe they types of effect sizes the original study was powered to reliably detect.
“there is the possibility that OXT may have a smaller effect, perhaps limited to particular subpopulations”: This brings to mind a recent paper describing the importance of recognising the heterogeneity of study populations (https://www.nature.com/articles/s41562-021-01143-3). However, I will leave it up to the authors if they wish to mention this paper and/or the broader issue of recognising population heterogeneity
“…and applying proper statistical techniques to improve interpretability”: some examples of these techniques should be named.
“…which they suspect (and we confirmed) to result from a clipped or misprinted aspect of the figure”: How was this confirmed? By the authors?
Table 1 - there is a gap just before “inattentiveness”
I like the author’s approach for power analysis (i.e., simulations). Is this code available? Perhaps I missed the link? I would recommend posting this code on OSF.
Regarding dose, I understand the choice of 24IU, given this is a replication, but potential dose-dependent effects of oxytocin should also be mentioned in the article.
Figure 1b - I would be helpful to make the axis tick labels with a slightly larger font
“where it can cross over to the hypothalamus…” Would be more accurate to say something like “where it can travel to the hypothalamus…”
Why was alpha set to .02 and not .025?
“We used extracted probabilities for the placebo condition from Kosfeld et al. and assumed a more reasonable true minimal effect size, a Cohen’s d =.2.” This Cohen’s d value seems reasonable, but can the authors provide a justification for this particular value?
“When using the present experimental design for a different population, it should be kept in mind that OXT administration can induce uterine contractions”: While the risk of inducing uterine contractions in pregnant women is certainly a consideration for not including females, this is usually mitigated by administering pregnancy tests prior to administration. I would assume the more likely reason that females have not been usually included is the potential impact of different hormone levels across the menstrual cycle on oxytocin effects.
“Our experimental procedure will be partially based on the replication by Declerck and colleagues, with adjustments made to serve the purpose of our study”. I think it would be very useful to have a table or text box that summarizes the adjustments made and why these were made.
“…with levels of chronic nasal obstruction as a covariate”: The method of evaluation is only described further down, but it should be introduced earlier.
What is the purpose of the saliva samples? Are you evaluating oxytocin receptor SNPs and peripheral oxytocin levels? It seems oxytocin concentrations are being evaluating pre and post administration, but the utility of measuring oxytocin in saliva post intranasal administration is compromised by the “drip down” of exogenous oxytocin from the nasal cavity to the oral cavity. So rather than measuring circulating levels of oxytocin, this approach mainly measures exogenous oxytocin. Indeed, after intranasal administration, saliva oxytocin levels are not related to peripheral levels measured in blood plasma (https://doi.org/10.1016/j.yhbeh.2018.05.004). Are there any predictions regarding oxytocin receptor SNPs and the the effects of oxytocin on trusting behaviors?
“Consequently, the upper bound (ΔU) will be set to d =.33, and the lower bound (ΔL) will be set to d =-.33 (testing one-sided).“ I understand the justification for these equivalence test bounds, but these bounds are relatively large. The median summary effect size for oxytocin administration study meta-analyses is 0.14, and this doesn’t even account for publication bias, so the “true” effect is likely smaller (https://doi.org/10.1016/j.cpnec.2020.100014)
“We will report the distribution of investment and will take potential ceiling effects into account in our statistical analyses.” Can authors provide a suggestion for how this will be taken into account, if necessary?