DOI or URL of the report: https://osf.io/8gvw7
Version of the report: 2
Revised manuscript: https://osf.io/5ak6p
All revised materials uploaded to: https://osf.io/n9zp4/, updated manuscript under sub-directory "PCIRR Stage 1\PCI-RR submission following R&R"
Two of the three reviewers were available to review your revised manuscript, and we are now very close to being able to award Stage 1 IPA. There are two remaining minor points to consider in the review by Laurens van Gestel -- one concerning a suggested nuance to be added to the introduction (which seems sensible to me) and another concerning the weight that should be attributed to informal claims that have not been published in the peer-reviewed literature. My own personal view is that the academic community sometimes places too much relative weight in the veracity of claims made in peer-reviewed journals compared to other sources (such as preprints), but I agree with the reviewer that the finding being presented only in the way it has makes it difficult to evaluate. I think the best solution in this case is not to edit out the (informal) reference to Thaler (2021) but to acknowledge (perhaps in a footnote) just how informal it is, and that the claim cannot be fully evaluated. That way the reader will be in the best position to make up their own mind.
I recognise that you are on a tight study schedule, but hopefully these remaining edits should take very little time to make. In the meantime, I will begin preparing the Stage 1 recommendation so that we issue IPA immediately following receipt of the final revised submission.
I am happy to see that the reviews contributed to redesigning the study and that the authors appreciated the reviewer comments. I think the manuscript has improved a lot, and am confident that the current study will be a valuable contribution to the literature.
The reviewers have extensively replied to most of my comments in a satisfying way. There is one point of concern remaining for me, which relates to the discussion of the literature on defaults effects, which is a point I raised before but, in hindsight, did not elaborate on enough. I can follow the authors' reasoning, but would have liked to see a clear distinction in the discussion of the literature when it comes to literature on defaults specifically (e..g, Jachimowicz et al., 2019) versus nudging more generally (e.g., Mertens et al., 2022). The authors now added a definition of nudging in their revised manuscript, and rightfully indicate that nudging should be seen as an umbrella term and I think this is important when discussing the literature.
The discussion in the manuscript about publication bias in research on nudging in general is rightfully mentioned in the manuscript (Maier et al., 2022; Szaszi et al., 2022), but it is also connected to claims about publication bias about defaults specifically. To be clear, I would not be surprised if publication bias is also present in the literature on default nudges, but the only peer reviewed work that I am aware of on exactly this topic currently (perhaps strangely enough) suggests the following: "If anything, our publication bias analyses highlight that larger effect sizes are underreported, suggesting that researchers may not bother to report replications of what are believed to be strong effects." (Jachimowicz et al., 2019, p. 174). Moreover, Szaszi et al. (2022) state: "Until then, with a few exceptions [e.g., defaults (6)], we see no reason to expect large and consistent effects when designing nudge experiments or running interventions."
Therefore, I think it's important to disentangle discussions about the literature on nudging in general and defaults as a specific type of nudge, and would like to suggest the following nuance to be added in the manuscript:
"Several recent analyses concluded default effect as having medium to strong effects, and among the strongest of the “nudge” interventions (Hummel & Maedche, 2019; Mertens et al., 2022a; Jachimowicz et al., 2019) though follow-up critics showed that the literature [on nudging in general] is heavily affected by publication bias and that effects adjusted for that seem much weaker (Maier et al., 2022; Szaszi et al., 2022).
Finally, regarding the graph mentioned in the talk by Richard Thaler, this is very important data that we should take seriously, but I can't assess the scientific quality of it as it has not been published in a peer reviewed journal (at least, not that I'm aware of). I find this a rather difficult point to raise but, as much as prof. Thaler is to be respected, I believe we cannot build on this graph for the moment. The authors now rightfully acknowledge that this is based on informal reports, but by adding an argument to it (which by itself is plausible, but not published in the literature), it does become quite substantial. I therefore think that this paragraph needs revising, but am also open to leaving it up to the editor to decide what to do with citing unpublished work. Moreover, if I missed out on something or am wrong in this regard, I would be totally happy to withdraw my concerns about this issue.
Regardless of these points, and as indicated above, I believe the proposed study will be a valuable contribution to the literature. I look forward to seeing the results in the Stage 2 manuscript.
I find the round 2 manuscript of the Stage 1 RR to be much improved. A lot has been clarified. I also welcome the removal of the treatment condition related to past benhavior and the addition of the neutral manipulations for the two remaining conditions (status quo and default).
As is stands, I think that the study can go to data collection and I recommend that the Stage 1 report be accepted as is.
DOI or URL of the report: https://osf.io/jhu2y
Version of the report: 1
Revised manuscript: https://osf.io/8gvw7
All revised materials uploaded to: https://osf.io/n9zp4/, updated manuscript under sub-directory "PCIRR Stage 1\PCI-RR submission following R&R"
I have now received three very detailed and constructive reviews of your Stage 1 submission. Broadly, the reviews are very encouraging and I believe the submission is a promising candidate for eventual Stage 1 in-principle acceptance. Major points to address include clarification of theoretical concepts and study rationale, increased details concerning the contingent analysis plans, and doubts about the validity and fidelity of the past behaviour manipulation (IV3; this concern was flagged by two of the reviewers). The evaluations offer many helpful suggestions for resolving these concerns, including some potentially substantial design changes. I look forward to seeing your response to the reviews and receiving your revised manuscript in due course.
This Stage 1 manuscript proposes very valuable work and an important replication of impactful research. I believe that the current proposal largely meets the criteria that are important for Stage 1 acceptance, such as having a solid research question in light of theory or applications and having a sufficiently detailed protocol that enables replication. Moreover the sample size seems adequate, the proposed analyses are fit to test the hypotheses, and the questionnaire itself contains manipulation and quality checks. Yet, I have some concerns regarding the operationalization of the default and other methodological decisions.
1) The current study makes a conceptual distinction between status quo and defaults. Although this distinction makes sense conceptually, the operationalization of the default seems odd. In the study, status quo is manipulated by manipulating in the scenario whether the new fixtures have been outfitted with INC light bulbs of SFL light bulbs. Defaults are subsequently manipulated by placing the reference point of the slider either fully in favor of INC-bulbs or CFL-bulbs, or in the middle. However, preference is not identical to a choice or decision, so it is odd to place a default in that slider for measuring preferences. What is the rationale for this, and wouldn’t it be better to place the default in the choice itself by for example preselecting one of the two options for the question “In this situation, what will you do?”? In Table 1, Hypothesis 3 is expressed as ‘People are more likely to choose the default option’, but in the current proposed study, it seems that this is not what is being measured. Rather, it seems that the effect of a default on expressed preferences is measured. I thus believe the conceptual distinction that the authors make between status quo and defaults deserves further refinement in the operationalization of the study.
2) Moreover, why do the authors decide to manipulate past behavior. How trustworthy do the authors believe that this manipulation will be? Wouldn’t it be better to measure past behavior by letting participants indicate this themselves? Or if not, would it be possible to check for trustworthiness of this manipulation?
Minor:
1) The introduction currently contains some repetition
2) The part about default effects seems unnecessarily critical, perhaps to make the point of distinguishing between default effects and status quo. However, the findings about organ donation defaults are clear regarding consent rates. Mixed results arise when speaking about actual organ donations, which is more of a downstream effect, but the default as such can be effective in stimulating consent rates. Similarly, as the authors point out themselves, defaults are among the most effective types of nudges (e.g., Jachimowicz, 2019; Hummel and Maedche, 2019). The Mertens et al. (2022) meta-analysis that has led to controversy is a meta-analysis of all types of nudges across all behavioral domains, so this criticism cannot directly be applied to defaults specifically per se. This section deserves a bit more nuance.
3) Regarding past behavior, I would suggest the authors refer to literature on habits
4) Regarding preferences, it may also be interesting to look at how preferences impact status quo effects. See for example, de Ridder et al., 2022 on Nudgeability and work by Venema et al.
5) I applaud the authors for their scientific rigor. Checking comprehension seems very important to do. Yet, this makes the manipulation very explicit and perhaps a little unnatural as well, and this may impact results (e.g., demand effects). As this is a deviation from the original article, this is something that deserves attention from the authors.
6) The scenario now reads “Tomorrow evening the head contractor comes by your home to discuss the last aspects of the addition…” I believe it would be more comprehensible if ‘tomorrow evening’ is replaced by ‘The next evening…’.
7) Incandescent lightbulbs have been phased out in the EU. I am not knowleable about current legislation in the USA, where the study will be conducted, but it may be worthwhile to check legislation across states to check whether this scenario is still applicable (in case the authors haven’t done so yet).
Thank you for giving me the opportunity to review Stage 1 of “Reference points and decision-making: Impact of status quo, defaults, and past behavior in a conceptual replication and extensions Registered Report of Dinner et al. (2011)” by Yam and Feldman.
The work in question proposes a replication and extension of two of the studies reported in Dinner et al. (2011). The authors suggest that although Dinner et al. (2011) identified their work as pertaining to default effects, they actually had measured status quo bias. As a result, Yam and Feldman plan to replicate Dinner et al.’s (2011) status quo bias findings while also extending said research to the role of default effects.
To do so, the authors provide a detailed study plan with a large planned MTurk sample of N = 1500 US adults. The authors’ basic premise for the need for a replication appears sound, as do the proposed changes to the original study design by Dinner et al. (2011). Below, I offer additional suggestions for further improving the quality of the manuscript and of the planned research.
Writing quality
The manuscript requires careful proofreading. Below, I list some of the examples of typos and errors I came across.
For improved readability, I would also suggest splitting longer run-on sentences. For example, on page 11, the sentence “At the time of writing…” spans six lines.
Definitions and Specifications
Given that the paper’s premise is that Dinner et al. (2011) did not correctly define the studied construct, I encourage the authors to define core concepts such as “nudge.”
Page 10: Your example of status quo bias is “a person, who has been enrolled in a health
insurance plan for several years, not acting to change to a new plan that guarantees better coverage and cheaper installments with less risks.” I want to raise two points here. First, in your definition of status quo bias, you present the status quo as created through somebody else’s decision, whereas your example identifies the decision as made by the decision maker of interest. Please specify that status quo bias can result both from choices made by others and choices made by the decision maker of interest. In your design, status quo is set by somebody else, although you also account for simulated past behavior of the decision. It seems relevant to highlight the distinction. Second, please make sure to explain how your example differs from the sunk cost fallacy (as some might argue your example applies to said fallacy).
On page 3, the manuscript suggests author Yam took charge of pre-registration, whereas elsewhere it says no pre-registrations have occurred. If necessary, correct on page 3 as “planned pre-registration.”
Statistics
Page 19: Specify the version of “G*Power ” used. I am surprised by the authors’ choice to apply a power of .95. Although .95 is the default setting in G*Power, the norm in Psychology research is to apply a power .80. Although I realize the authors are proposing a very large sample of N = 1500, for the sake of power analyses, I would still strongly encourage the authors to re-run their power analyses with power .80 and to update their planned sample size
The authors’ did not specify planned analyses should the obtained data violate the requirements of parametric tests. I encourage the authors to put in place planned tests to examine data distribution and to propose non-parametric alternatives for planned analyses.
Page 34: add ‘between’ or ‘within’ to factor concerning default effects.
Design
As the authors point out, the presence of pre-installed light bulbs acts as a status quo for the decision maker. To properly study the distinction between status quo bias and default effects, it seems to me that replicating the Dinner et al. (2011) task design without pre-installed light bulbs, but with a default option pre-selected for the decision maker, would be a more direct and straightforward way of testing default effects in the context of Dinner et al. (2011). Rather than making the study’s design more complex my including another condition, I would encourage the authors to consider including a second sample which takes this approach (and this approach can also examine past behavior as a reference point etc). If doing so falls outside of the authors’ means or the scope of this manuscript, I would encourage the authors to justify the choice not to directly run such a test/ to discuss the limitations of their extention.
For any study, I would always advocate for assessing information concerning participants’ demographic background. Given the authors are testing a decision with financial consequences (e.g., spending more vs. less), I would expect an assessment of participants’ income or sociodemographic background.
As per the reviewer guidelines, I am signing this review: Julia Nolte