Does oxytocin enhance pro-social behaviors?

based on reviews by Thibaut Arpinon and 1 anonymous reviewer
A recommendation of:

Oxytocin, individual differences, and trust game behavior: a registered large-scale replication

Submission: posted 25 November 2022
Recommendation: posted 19 April 2023, validated 21 April 2023
Cite this recommendation as:
Espinosa, R. (2023) Does oxytocin enhance pro-social behaviors?. Peer Community in Registered Reports, .


Trust is a fundamental element for the well-functioning of our society. By achieving large levels of trust, societies can reduce monitoring costs (e.g., contract supervision, conflict resolution) and increase investments in more profitable but socially riskier projects. Trust serves as a way to foster cooperation among individuals and can benefit the entire social group.
As a result, numerous researchers have sought to explore the biological foundations of trust. In particular, several research teams have tried to explore the role of oxytocin in modulating prosocial behaviors. However, previous research findings obtained mixed findings, and several methodological issues, such as low statistical power, have hindered the development of a definite view on the matter.
In the current study, Kroll et al. (2023) seek to replicate the seminal study of Kosfeld et al. (2005). Kosfeld and co-authors originally found that the intranasal administration of oxytocin can significantly increase trust among individuals. However, recent replication evidence shows no general effect of oxytocin on pro-social behaviors or suggests that the effect might be heterogeneous among social groups (Declerck et al., 2020). The objective of the current study of Kroll et al. (2023) is twofold. First, the authors propose to replicate the original study of Kosfeld et al. (2005) and to investigate whether the lack of replication can be due to an overall effect that is smaller than the original findings suggest. Second, the authors propose to replicate recent but exploratory results on the heterogeneous effect of oxytocin on pro-social choices. 
The Stage 1 manuscript was evaluated over one round of in-depth review. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol:
Level of bias control achieved: Level 4. At least some of the data/evidence that will be used to answer the research question already exists AND is accessible in principle to the authors (e.g. residing in a public database or with a colleague) BUT the authors certify that they have not yet accessed any part of that data/evidence.
List of eligible PCI RR-friendly journals:
1. Declerck, C. H., Boone, C., Pauwels, L., Vogt, B., & Fehr, E. (2020). A registered replication study on oxytocin and trust. Nature Human Behaviour, 4(6), 646–655.
2. Kosfeld, M., Heinrichs, M., Zak, P. J., Fischbacher, U., & Fehr, E. (2005). Oxytocin increases trust in humans. Nature, 435(7042), 673–676.
3. Kroll, C. F., Schruers, K., Viechtbauer, W., Vingerhoets, C., Seidel, L., Riedl, A. & Hernaus, D. (2023). Oxytocin, individual differences, and trust game behavior: a registered large-scale replication, in principle acceptance of Version 2 by Peer Community in Registered Reports.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Evaluation round #1

DOI or URL of the report:

Version of the report: 1

Author's Reply, 07 Apr 2023

Decision by , posted 13 Jan 2023, validated 15 Jan 2023

Dear authors,

Thank you very much for submitting your manuscript to PCI-RR. I have now heard from two referees and I have also read your paper in detail. I believe your paper to be a very nice project and think you devoted a lot of effort in drafting your manuscript. Thank you for this.

I propose a round a revision to take into account the referees' comments (and possibly mine) to improve your manuscript. 

Please let me know should anything be unclear.

Best regards,



• The Participants section mentions several elements that are mentioned in the design section. I would advice to put the design section before the participants section. For instance, you talk about Experimental Monetary Units (EMU) that you haven't introduced yet.

• Please define a procedure for potential floor or ceiling effects. I think that the very purpose of RR is to specify in advance what you plan to do. So, we should avoid as much as possible to leave it to ex-post decisions. (Page 18) 

• I do not understand why you do not want to correct for multiple comparisons for your registered analyses. By definition, correcting for multiple hypothesis testing requires to know how many tests you will run. It is only possible with registered hypotheses (as we do not know what exploratory analyses you will do and how much of it you will report). You say in the design table that you do not need to correct because "no related analyses are conducted using this variable". Well, you look at the same dependent variable in three hypotheses (H1, H2, H3), with the same dataset. In my view, these hypotheses are part of the same family of hypotheses (the OXT has an effect), which would justify correction. 

• As a side note: I think that you can use a linear model to analyze the investment decision (eventually a Tobit model in case you fear ceiling or floor effect) because the investment decision is a number of tokens. I do not see here why you would need a non-linear model. 

• Page 21: You say "In case the effect of OXT on investments falls within the range ..." You talk about the confidence interval in the design table, which is clearer than this section. I would suggest to complement it.

• Why setting alpha=0.02 in the power analysis? 

• Power analysis is made on H1 only.  The main issue is that heterogeneity analysis (which is the case here for H2 and H3) requires more observations to maintain a sufficiently high statistical power. It would be nice to compute the expected statistical power of hypotheses 2 and 3. 

• Design Table: I fear that the interpretation is a bit too strong for the first line. If you fail at rejecting H1, does it really mean that it contradicts the theory or does it mean instead that you fail at finding supporting evidence for the theory? 

• Last, an important issue for me is the clustering of standard errors. Indeed: participants in your experiment spend some time discussing in small groups. This discussion session inevitably generates dependence between observations within the same session. In this case, we would usually cluster the standard errors at the session level. Note that clustering will drastically reduce your degree of freedom, but it seems the correct way to proceed. 

• I agree with the anonymous referee who suggests to put the code of the power analysis online or in the manuscript. 




Reviewed by , 26 Dec 2022

Reviewed by anonymous reviewer 1, 09 Jan 2023

This is an important replication study that will follow on from the exploratory findings from Declerck et al (2021). This proposed study is a nice demonstration of how exploratory analyses (i.e., findings from Declerck et al) can lead to future hypothesis driven work. Pooling the new data collected with the previously reported data from Declerck et al is also an nice idea. 

The term “minimal effects analysis” in the abstract is unclear to me. Do you mean equivalence testing?

“The hormone oxytocin (OXT) is a nine amino acid neuropeptide that is synthesized in the hypothalamus…” It is mainly synthesized in the hypothalamus, it is also synthesized at other sites in the body, but in much smaller amounts.

It should be briefly noted why small sample sizes bias true effect sizes. It may also help to describe they types of effect sizes the original study was powered to reliably detect.

“there is the possibility that OXT may have a smaller effect, perhaps limited to particular subpopulations”: This brings to mind a recent paper describing the importance of recognising the heterogeneity of study populations ( However, I will leave it up to the authors if they wish to mention this paper and/or the broader issue of recognising population heterogeneity

“…and applying proper statistical techniques to improve interpretability”: some examples of these techniques should be named.

“…which they suspect (and we confirmed) to result from a clipped or misprinted aspect of the figure”: How was this confirmed? By the authors?

Table 1 - there is a gap just before “inattentiveness”

I like the author’s approach for power analysis (i.e., simulations). Is this code available? Perhaps I missed the link? I would recommend posting this code on OSF.

Regarding dose, I understand the choice of 24IU, given this is a replication, but potential dose-dependent effects of oxytocin should also be mentioned in the article.

Figure 1b - I would be helpful to make the axis tick labels with a slightly larger font

“where it can cross over to the hypothalamus…” Would be more accurate to say something like “where it can travel to the hypothalamus…”

Why was alpha set to .02 and not .025?

“We used extracted probabilities for the placebo condition from Kosfeld et al. and assumed a more reasonable true minimal effect size, a Cohen’s d =.2.” This Cohen’s d value seems reasonable, but can the authors provide a justification for this particular value?

“When using the present experimental design for a different population, it should be kept in mind that OXT administration can induce uterine contractions”: While the risk of inducing uterine contractions in pregnant women is certainly a consideration for not including females, this is usually mitigated by administering pregnancy tests prior to administration. I would assume the more likely reason that females have not been usually included is the potential impact of different hormone levels across the menstrual cycle on oxytocin effects.

“Our experimental procedure will be partially based on the replication by Declerck and colleagues, with adjustments made to serve the purpose of our study”. I think it would be very useful to have a table or text box that summarizes the adjustments made and why these were made.

“…with levels of chronic nasal obstruction as a covariate”: The method of evaluation is only described further down, but it should be introduced earlier.

What is the purpose of the saliva samples? Are you evaluating oxytocin receptor SNPs and peripheral oxytocin levels? It seems oxytocin concentrations are being evaluating pre and post administration, but the utility of measuring oxytocin in saliva post intranasal administration is compromised by the “drip down” of exogenous oxytocin from the nasal cavity to the oral cavity. So rather than measuring circulating levels of oxytocin, this approach mainly measures exogenous oxytocin. Indeed, after intranasal administration, saliva oxytocin levels are not related to peripheral levels measured in blood plasma ( Are there any predictions regarding oxytocin receptor SNPs and the the effects of oxytocin on trusting behaviors?

“Consequently, the upper bound (ΔU) will be set to d =.33, and the lower bound (ΔL) will be set to d =-.33 (testing one-sided).“ I understand the justification for these equivalence test bounds, but these bounds are relatively large. The median summary effect size for oxytocin administration study meta-analyses is 0.14, and this doesn’t even account for publication bias, so the “true” effect is likely smaller (

“We will report the distribution of investment and will take potential ceiling effects into account in our statistical analyses.” Can authors provide a suggestion for how this will be taken into account, if necessary?

User comments

No user comments yet