DOI or URL of the report: https://osf.io/rgtv5?view_only=903466f6ec4343c09ec26bd52698a2d7
Version of the report: 1
Dear Dr. Fillon,
Thank you for facilitating the review process of our Stage-1 RR. We found your and the reviewers' advice and suggestions extremely helpful and substantially updated our manuscript. Below, we detail point-by-point replies to all of the concerns/suggestions raised and specify the changes to the manuscript to alleviate the concerns.
E: …I would like to have an explanation in the last table about what could happen if the HMC does not work, but also if we do find difference between secular and religious in secure environment. I read carefully the paragraph above figure 1 and I have not seen anything regarding this condition and wonder if it would not be a necessary condition to make the rest of the analysis work. To be clear, if you find that in the secure condition, religious people are more normative than secular people, how can you ensure that you will have the possibility to detect a difference in the insecure condition?
Regarding a failed HMC: Just as a quick reminder, we will manipulate two factors: environment (secure vs insecure) and institution (religious vs secular). The first factor manipulates a threat to cooperation (others withdraw from the common pool) and the second factor manipulates the nature of insitutions that participant may choose from to mitigate this threat (they can choose either secular or religious normative groups). Our main dependent variable assess wheter participants chose to join a normative group or not.
Our original manipulation check (HMC) assumed that participants in the insecure condition would perceive that others would withdraw higher amounts from the common pool compared to participants in the secure condition (they will be more threatened by free-riding). This perception should prompt participants in the insecure condition to choose normative groups (be it religious or secular).
When addressing the concern of potential HMC failure during revision, we realized that the failure might be caused by the noise in participants' self-reported expectations about others' game behavior. While we expect these self-reports to be reflected in participants' choice of the normative groups, it is not a necessary precondition. We could observe no support for the original HMC and still get a valid test of H1 (participants playing differently than they said they would play). Furthermore, it could also happen that we would confirm the original HMC but then observe no effect of our insecurity manipulation on group choice in the secular condition due to design issues/parameters setup. Indeed, if there would be no effect of the insecurity manipulation in the secular condition, we could hardly expect one in the religious condition.
To alleviate this concern, we shifted our main manipulation check from the perceptual to the behavior level. Namely, our new manipulation check expects that participants in the secular condition would be more likely to choose the normative group in the insecure compared to the secure condition. We believe this is a more severe manipulation check because a lack of this difference would mean that we either did not manipulate insecurity enough or the institution designed to mitigate insecurity was not appealing/functional enough. Note that in all our original hypotheses tests, we assumed this difference is present in the secular condition (but not necessarily in the religious condition); hence, shifting HMC made the crucial assumptions explicitly testable. We will retain the measure of expected contributions by others (our original HMC) as an additional test of our manipulation and a potential explanatory variable for further exploratory analyses, yet the validity of our tests does not hinge on this variable anymore.
To assess whether the new HMC would likely be supported, we conducted several probes of different parameter values of the Nash Demand game (see the main text for details), revealing that our originally proposed parameter values would present an insufficient manipulation (there was still much insecurity in the secure condition). Thus, we set the game's parameters such that there would be a strong contrast between the secure and insecure environments in the secular condition and conducted a pilot study as advised by R2. The pilot study showed that participants were sensitive to our manipulation, and we included the methodology and results of the pilot study in the manuscript. Specifically, we recruited 50 US and 50 Polish participants in the secular condition and found a well-estimated effect of the environment manipulation on the choice of normative groups (β = 1.56; 95% CIs = [0.56 – 2.55]). This difference translated into the 16% probability of participants choosing the normative group in the secure condition and 47% probability in the insecure condition. This effect was in the same direction in both the US and Polish samples.
Moreover, we found that participants in the insecure condition expected others would be more likely to withdraw money from the common pool than participants in the secure condition, our original manipulation check (β = 1.23; 95% CIs = [0.43 – 2.85]). These pilot results afford substantial confidence that our manipulation is effective and produces expected differences (see the main text and Supplementary R code for more supportive results). We also recruited 50 US participants in the religious condition to check whether participants would not report any issues with our manipulation or whether we would not reach floor effects that would prohibit testing H1. While the sample was too small for any inference, we observed that 8% of participants chose the normative group in the secure condition and 10% in the insecure condition. Participants reported no issues with the manipulation, and we are now confident that our manipulation will allow us to test H1 properly.
Nonetheless, despite this initial support, it is possible that HMC will not be supported in the actual study with a larger sample size. Especially given that we set the minimal effect of our manipulation at a 15 percentage-point (%pt.) difference between the secure and insecure conditions. If we would observe a smaller effect, we would test whether HMC would be supported in a subsample of our data, namely, in prosocial participants. As we explain in the manuscript, free-riders may introduce noise into group choice in the insecure condition; hence, a more lenient HMC would state that prosocial participants in the secular condition will be more likely to select the normative group in the insecure compared to the secure condition and that this difference will be > 15 %pt. In case the 95% CIs of HMC would include zero, we would add group choice from the reversed environment (i.e., add a repeated measure) to increase our statistical power. These follow-up steps are now also reported in Table 1 and illustrated in the Supplementary R code.
Regarding the difference between the secular and religious conditions in the secure environment: this is a very good point that, to some extent, still applies to our current H1 test. Before addressing this point, we would like to explain another change in our hypotheses. Originally, we proposed to test whether participants in the religious condition would be more likely to choose a normative group in the insecure compared to the secure environment (HA1 standing for rationalization theory expecting no effect vs HB1 standing for the existential insecurity theory expecting an effect). This hypothesis remains our main hypothesis in the current manuscript (H1). However, we also proposed that if HB1 would not be supported, we would require a significant interaction between the environment and institution factors to support the rationalization theory and to show that our insecurity manipulation worked in the secular condition (originally HA2). Nonetheless, since this is currently our manipulation check, we do not need to add this condition to the test of the rationalization theory. Of course, it could still be argued that even though we may support HMC and fail to support H1, there would not necessarily be a statistical difference in the effects of insecurity between the religious and secular conditions. For this reason, we set the minimal effect size for HMC larger than for H1 (15%pt. vs 10%pt.). We ran 500 simulations with parameters set up to support HMC and fail H1 and found that in 85% of cases, we would find an interaction effect with 95% CIs excluding zero. Thus, while we plan to include this interaction in our exploratory analyses (the interaction is still interesting to help us understand whether the effect in the secular condition is larger/smaller than the effect in the religious condition), it is not a necessary test to distinguish between the rationalization and existential insecurity theories.
Given these new hypotheses, we do not need to directly compare the religious and secular conditions in the secure environment, and we can be agnostic about this difference in our main hypothesis test. However, the probability of normative-group choice in the religious-secure environment will affect our statistical power to detect SESOI for H1. Since we are dealing with a difference in percentages bounded between 0-1, the higher the intercept (that is, % choosing normative group in the the religious-secure evnvironment), the less likely we are to observe a 10%pt. difference in the religios-insecure environment (e.g., an increase from 5% to 15% is a large effect—15% is triple the original percentage—but the same difference of 10 %pt. from 55% to 65% leads only to 1.18 times the original percentage). Thus, we purposefully set our game parameters such that the probability of choosing the normative group would be low in the secure condition and give us a better chance to detect SESOI. In the pilot test, we observed this probability at 8% in the religious condition and used this observed probability in our power analysis. We estimated that to reach 90% statistical power to test H1, we would need 450 participants in the religious condition (we use 90% power as advised by R2). However, if this intercept would be higher than expected (e.g., 10% probability), we would still have 81% power to detect SESOI.
E: I would also share a recent experience. I wrote this commentary https://osf.io/preprints/psyarxiv/qjf7m. In the section "Issues with the quality of the data for Muslims", we looked at participants that were filtered on Mturk (not prolific) as muslims and who were asked to describe their religious belonging. 23% declared to be from another religion, which is not low. That said, you should be very careful excluding them in the Qualtrics survey based not only on the filter but on an additional question.
Thank you for pointing us to an interesting paper. Regarding the specific issue you mention (which is also raised by R2): We are well aware of this issue, although we hazard that compared to MTurk, the quality of Prolific filtering criteria is much higher. Indeed, in our recent study, we recruited secular and Christian participants separately based on Prolific filters and added religiosity questions also to our questionnaires. Four of the 204 participants filtered by Prolific as not affiliated to a religious tradition selected "Other" religious affiliation. Nonetheless, more nuanced data showed that 11 out of the 204 non-affiliated participants agreed that god can punish immoral behavior (although none of the participants selected that they "strongly agree").
For the current RR, we will exclude participants from the analysis who believe in the existence of a punitive god and are affiliated with some religion (questions we ask in the survey as suggested by R2). We state this additional exclusion criteria in the updated manuscript. Also, based on our pilot data, we expect to exclude ~10% of our sample on these criteria and increased the planned sample size to still have the required statistical power even after participant exclusion.
E: In the beginning of the section on design, it is stated that the survey is available in the OSF folder, however it is not yet. This should be reframed for the RR1.
Sincere apologies; we made the survey available but not directly in the RR folder at OSF but in the folder of the OSF project (i.e., one folder up in the folder hierarchy). We regret this omission since the possibility of reading through the survey might have precluded some of the misunderstandings we explain below. We have now uploaded the updated survey directly to the OSF RR folder.
E: Following up on this idea, could you provide more information regarding how the game will be played? Indeed, you say that the whole experiment will be conducted on Qualtrics, but I am not sure that it is feasible. I already tried and ended up doing it on otree because I could not figure out how possible it is to make an interactive game between two participants on Qualtrics. This is especially a challenge if you plan to bring participants from prolific to Qualtrics and then to another platform before going back to prolific. I would encourage you to pretest it, but also require explaining more carefully it in the paper. The procedure is well explained, but it lacks some details around the relationship between the survey and the game to ensure a good reproducibility.
The key to our design is the fact that our game is not directly interactive. Indeed, we have previously used o-Tree for the same reasons but have adapted our economic games to be "one-shot" since then. In this respect, participants make their decisions without knowing the decisions of other players (as the game requires), and we randomly pair them with other participants after the end of data collection to calculate their earnings. Of course, this would not be possible if we had multiple rounds of the game.
While we stated as much in the previous manuscript, we do take your concern seriously and clarified in the design section that the Nash-demand game is one-shot and non-interactive.
R1: It was quite unclear what the modernization versus insecurity "conditions" (which are not listed in the 2x2 design) referred to. Does the modernization plot at figure 1 refer to the US respondents, while the insecurity plot refers to the Poland respondents? This is suggested in the second-to-last paragraph before the methods section, but isn't otherwise clearly stated in the text. The formulation of this environmental factor here is quite confusing before "Insecure" comes up both in the design of one experimental condition (secure vs insecure), and is listed (modernization vs insecure) as an "environmental" factor. This should be clarified, and perhaps renamed.
To clarify: Figure 1 is meant to show predictions of the two competitive theories (hence the labels "modernization" and "insecurity" that we use in the text to describe these two theories). We have now updated the labels to include the word theory and not be confused with our manipulation. Note that we also renamed the "modernization theory" to "rationalization theory" in the text because this label better captures the nature of the mechanism this theory proposes.
R1: I'm not convinced that using the Bible in the religious statement would be appropriate here, because participants might have a specific affect towards the Catholic Church that impacts their reaction independently from the principle of religion itself. Randomizing the religious text the statement is said to come from could limit the effects of this problem.
To alleviate this concern, we made two changes – participants need to spend time reading the text rather then transcribing it, hence they do not need to see the text before choosing the group. Furthermore, we are now describing the text as religious rather than text from the Bible. Although it could be argued that in both US and Polish contexts “a religious text” would most likely connotate the Bible, this change to a more general text is in our view a better alternative to varying the religious tradition the text comes from. In varying the religious tradition, we might encounter even more hostility (e.g., toward the Koran).
Moreover, the revised survey now includes a battery of questions on participants' attitude toward religion from the Pew research on religious nones (Pew Research Center 2024). Specifically, we will ask whether participants believe that religion does more harm than good, causes division and intolerance, encourages superstition and illogical thinking, helps society by giving people meaning and purpose, and encourages people to do the right thing/treat others well. In our pilot data, 60% had mixed views of religion, 39% negative, and 1% positive. We will use these variables in our explorative analyses to get further insights into the different reasons people may be (un)willing to select the religious group.
R1: While the text describes this experiment as being mainly about institutions, the tested argument is rather about norms. The authors should clarify this in their framing.
We understand institutions to be packages of social norms (Henrich 2015) and use the term institution to index the nature of social norms (religious vs secular). In the current experiment, the normative groups have several norms; hence, we believe that our use of the label 'institution' can be defended and also helps us to better differentiate between the manipulated factors and the choice participants make (instead of using the word 'norm' for both of them).
R1: Do participants know that only other secular respondents are selected? If not (which I assume is the case), their results in the religious condition might be driven by the expectation they have about the percentage of religious respondents within their country. That is to say, in Poland, they might simply perceive that their likelihood of having a game partner that follows religious norms, and is hence more likely to follow these norms, is higher than in the US It would be easy to control for this by asking respondents about the perceived religious makeup of their country.
This is an excellent point. We added a question about the probability of playing with a religious participant in both the normative and non-normative groups. We will use the differential of the answers to these two questions in explorative analyses, which may help us get further insight into why participants selected the specific groups. Nevertheless, please note that even if secular participants would choose to join religious groups because they expect more religious people there, this could still be interpreted as support for the existential insecurity theory. Indeed, even in the longitudinal studies showing that people become more religious in the aftermath of war or natural disaster, we do not know whether they "truly" become more religious or join religious groups because of the services these groups offer. In our exploratory analysis, we will be able to tease these motivations apart (at least for our experimental setup) by looking at whether people who chose the religious normative group did so because of planning to cooperate or freeride.
R2: The success of the manipulation check (HMC) seems pivotal to the testability (and interpretability) of the main hypotheses, and it is worth bearing in mind that failure of a crucial manipulation check is one of the few ways in which a Stage 2 RR can be rejected after results are known (see criterion 2A here). For this reason, if I were in the authors' position, I would want to be extremely confident that the reality check of insecure vs secure on the expected amounts passes muster using the same recruitment pipeline on Prolific, and I would probably include this preliminary verification study in the Stage 1 manuscript. In a similar vein, I also noted the absence of a formal power analysis associated with HMC. I leave it to the authors to decide whether confirming HMC in a preliminary experiment is worth doing, but I do think the power analysis associated with HMC is crucial regardless.
We strongly concur with this advice and conducted a pilot study as described in our response to the editor's concern above. We also report the results of this pilot study in the revised manuscript. Moreover, we selected 15%pt. as the threshold for accepting HMC (see above) and included power analysis for this threshold, showing that with 350 participants in the secular condition, we would have 89.6% statistical power to detect the assumed effect.
R2: To selectively recruit secular participants, the authors plan to rely on Prolific's pre-screening of participants with no religious affiliation. I have heard informally that these pre-screening tools used by Prolific and other online recruitment services can be imperfect, so as a safeguard I would recommend introducing an additional filter based on the demography questions. For example, if a participant scores sufficiently high on religiosity (despite passing Prolific's pre-screening) then perhaps they could be separately excluded and replaced. It's important to consider these kinds contingencies now because exclusion criteria are very difficult to change after Stage 1 acceptance.
We are grateful for this advice, and as mentioned in our reply to the editor, we will use our own exclusion criteria (religiously unaffiliated and non-believing participants). We now specify these criteria more explicitly in the revised manuscript.
R2: I would recommend setting the minimum power to 90% rather than 80% (which seems a little low for what is otherwise an impressive test of competing theories). It may well be that the authors are close to achieving this already, as I note that the sample size is larger than the minimum needed for 80% power – but it is difficult to know by how much because exact power estimates are not reported. I suggest reported exact power estimates for each hypothesis and (resources permitting) ensuring that all are above 90%.
"Thus, we will recruit 720 participants, 360 per country, which will have >80% statistical power even after excluding participants according to the criteria specified below (estimated exclusion n = 40)." I recommend committing to a minimum sample size post-exclusions rather than attempting to estimate the exclusion rate. It is the sample size that is subjected to analysis after exclusions that will determine the eventual statistical sensitivity.
We now report the exact power of all our power analyses and increased the sample size in the religious condition to reach 90% statistical power for the H1 test. We also assume a 10% exclusion rate and increased our sample size accordingly.
R2: The study design table on pp13-14 is good (and I particularly liked the hypothetical results plotted in Figure 1, which shows careful consideration of possible outcomes). That said, in the design table I would like to see specific sampling plans defined for each hypothesis rather than (at present) for HA2 only.
We made the requested change and defined sampling plans for both new hypotheses. We also added some of the 'forking paths' suggested by the editor that may affect our results and specified what we would do in each case (see our reply to the editor's first concern).
References:
Henrich, Joseph. 2015. "Culture and Social Behavior." Current Opinion in Behavioral Sciences 3:84–89. doi: 10.1016/j.cobeha.2015.02.001.
Norenzayan, Ara, Azim F. Shariff, Will M. Gervais, Aiyana K. Willard, Rita A. McNamara, Edward Slingerland, and Joseph Henrich. 2016. "The Cultural Evolution of Prosocial Religions." Behavioral and Brain Sciences 39(e1):1–65. doi: 10.1017/S0140525X14001356.
Pew Research Center. 2024. "Religious 'Nones' in America: Who They Are and What They Believe."
Schnabel, Landon, and Sean Bock. 2017. "The Persistent and Exceptional Intensity of American Religion: A Response to Recent Research." Sociological Science 4:686–700. doi: 10.15195/v4.a28.
Stark, Rodney. 1999. "Secularization, R.I.P." Sociology of Religion 60(3):249–73.
Dear Authors,
Thanks for your submission. I now received two reviews with both important suggestions to improve this RR stage 1.
I think that all points raised by reviewer 1 are sound, and parts of the manuscript could benefit from an improvement in clarity. Furthermore, I think all the points can be easily addressed.
Points made by Chris Chambers are more serious, though. In addition, I would like to have an explanation in the last table about what could happen if the HMC does not work, but also if we do find difference between secular and religious in secure environment. I read carefully the paragraph above figure 1 and I have not seen anything regarding this condition and wonder if it would not be a necessary condition to make the rest of the analysis work. To be clear, if you find that in the secure condition, religious people are more normative than secular people, how can you ensure that you will have the possibility to detect a difference in the insecure condition?
I would also share a recent experience. I wrote this commentary https://osf.io/preprints/psyarxiv/qjf7m. In the section "Issues with the quality of the data for Muslims", we looked at participants that were filtered on Mturk (not prolific) as muslims and who were asked to describe their religious belonging. 23% declared to be from another religion, which is not low. That said, you should be very careful excluding them in the Qualtrics survey based not only on the filter but on an additional question.
I also have few suggestions regarding the game played.
- In the beginning of the section on design, it is stated that the survey is available in the OSF folder, however it is not yet. This should be reframed for the RR1.
- Following up on this idea, could you provide more information regarding how the game will be played? Indeed, you say that the whole experiment will be conducted on Qualtrics, but I am not sure that it is feasible. I already tried and ended up doing it on otree because I could not figure out how possible it is to make an interactive game between two participants on Qualtrics. This is especially a challenge if you plan to bring participants from prolific to Qualtrics and then to another platform before going back to prolific. I would encourage you to pretest it, but also require explaining more carefully it in the paper. The procedure is well explained, but it lacks some details around the relationship between the survey and the game to ensure a good reproducibility.
I am looking forward to seeing the revised manuscript.
Best regards,
Adrien Fillon
This registration proposes an experiment to test the effect of insecurity (as principally set by the X factor of a Nash demand game) on the probability of self-selecting into normative religious institutions. All participants are randomly assigned to either a secular or a religious condition, in which they can opt in a group (secular or religious) with norms, or without norms. The hypothesis advanced by the authors is that participants who are in the insecurity condition are more likely to opt into religious normative groups, but only in the non-modernized (i.e. insecure) environment.
I found this experiment interesting, and would recommend accepting it pending the following clarifications.
- It was quite unclear what the modernization versus insecurity “conditions” (which are not listed in the 2x2 design) referred to. Does the modernization plot at figure 1 refer to the U.S. respondents, while the insecurity plot refers to the Poland respondents? This is suggested in the second-to-last paragraph before the methods section, but isn’t otherwise clearly stated in the text. The formulation of this environmental factor here is quite confusing before “Insecure” comes up both in the design of one experimental condition (secure vs insecure), and is listed (modernization vs insecure) as an “environmental” factor. This should be clarified, and perhaps renamed.
- I’m not convinced that using the Bible in the religious statement would be appropriate here, because participants might have a specific affect towards the Catholic Church that impacts their reaction independently from the principle of religion itself. Randomizing the religious text the statement is said to come from could limit the effects of this problem.
- While the text describes this experiment as being mainly about institutions, the tested argument is rather about norms. The authors should clarify this in their framing.
- Do participants know that only other secular respondents are selected? If not (which I assume is the case), their results in the religious condition might be driven by the expectation they have about the percentage of religious respondents within their country. That is to say, in Poland, they might simply perceive that their likelihood of having a game partner that follows religious norms, and is hence more likely to follow these norms, is higher than in the U.S. It would be easy to control for this by asking respondents about the perceived religious makeup of their country.
Overall, I think this design makes a promising contribution, but could be clarified in its design and framing.