DOI or URL of the report: https://osf.io/9p3mj
Version of the report: v2
Revised manuscript: https://osf.io/twehu
All revised materials uploaded to: https://osf.io/pm264/, updated manuscript under sub-directory "PCIRR Stage 1\PCI-RR submission following R&R 2"
Three of the five reviewers kindly returned to evaluate your revised manuscript. As you will see, most of the major issues have been addressed and we are moving closer to Stage 1 IPA. However, there are some remaining points to address concerning the literature review, exclusion criteria, and planned statistical analyses, and the reviewers offer a range of constructive suggestions for resolving these issues. I look forward to seeing your response in a further (hopefully final) Stage 1 revision.
The authors have only partially addressed some of my original comments, and failed to adequately address my core comments/concerns. Below, I go through my original points, and comment on the authors’ responses to each one:
Comment 1) “[…] manuscript is poorly written […] I sincerely hope the authors will make an effort to proof-read their word before they submit it for review.”
=> The authors say that they have addressed grammar/spelling errors. I certainly hope so, but I didn’t go through the trouble of proof-reading the manuscript, so I’ll simply have to trust that the revised version is largely free of errors.
Comment 2) “The authors should cite and discuss other papers (besides Soman, 2001) that have previously tested (and found) sunk-cost effects for time (e.g., Bornstein & Chapman, 1995; Bornstein, Emler, & Chapman, 1999; Frisch, 1993; Navarro & Fantino, 2009; Olivola, 2018; Strough et al., 2008). This is important, since these other papers also speak to whether (and to what extent) there is a sunk-cost for time. In fact, I would suggest the authors provide a table that summarizes these other papers and, for each one, what they found (e.g., whether they found a sunk-cost effect of time).”
=> The authors have largely ignored and seemingly chosen to “dodge” this comment. This is extremely puzzling to me, since this comment is (i) easy to address, and (ii) important for reasons I previously tried to explain, but will now elaborate on further: The authors argue that “the intended scope for this replication is rather narrow”, but this argument fails on several grounds. First, replicating an effect does not absolve the authors of their responsibility to properly acknowledge and discuss the relevant literature. In this case, the relevant literature clearly concerns a (possible) sunk-cost effect for time (vs. money). Therefore, the authors should cite (and summarize the findings of) any/all prior papers that have tested such an effect, even if it was not the only/main goal of a given paper. Such a discussion is important, as it gives readers a proper sense of what the existing evidence is concerning time sunk-cost effects (or lack thereof), as well as possible boundary conditions, etc. Second, I’m not sure how “narrow” the authors want the scope to be, but I am sure that the goal of a replication (and science, more generally) is not merely to replicate a particular study, but rather to examine the (in)existence of an effect (or set of effects) more generally. Therefore, it is not the Soman (2001) studies and findings per se that are interesting and important, but rather the establishment of a sunk-cost effect for time (or failure to do so). Consequently, the authors cannot argue that discussing other prior studies that tested and found or failed to find time sunk-cost effects is beyond the scope of their manuscript. Third, were the authors serious (but mistaken, in my opinion) in trying to argue that their only goal should be a very narrow/specific attempt at a replication of Soman’s studies, then their proposal would have to be considered a failure, since they are using a different design, different subject population, etc. In other words, the authors can’t have it both ways: they cannot claim to be narrowly focused on specifically replicating Soman’s originally studies (as a supposed license to ignore other prior papers that tested time sunk-cost effects), while at the same time carrying out a replication that alters (i.e., departs from) many features of Soman’s original studies, and thus would generalize the testing of those findings beyond the paper. In sum, the authors can either (i) carry out a very narrow (i.e., exact) replication of Soman’s studies, which means they would need to replicate the original design features and collect data from a similar population OR (ii) they can generalize their replication to be about a time sunk-cost effect more generally, which allows them to depart somewhat from the original study but also requires that they acknowledge (and not ignore) other prior studies (including the ones I cited, above, as well as those provided by Review #4) that tested time sunk-cost effects. In my opinion, the latter (ii) is far more interesting and will make a greater contribution. Regardless of what they decide to do, the authors cannot have it both ways. Just to be clear, I was not suggesting that the manuscript should shift to becoming a thorough and detailed review of time sunk-cost effects. However, I think it is well within the scope of the paper to at least offer one table (and maybe even a graph plotting effect sizes) summarizing the results of prior studies involving time sunk-cost effects, and at least one paragraph summarizing what those prior studies found (overall).
Comment 3) “Having the same participants complete all 3 studies in a single session is problematic, as it may cause spillover effects, amplify demand effects, etc. The authors should consider randomly assigning participants to one of the 3 studies (not all 3). Or, at the very least, the main analyses should only focus on the first study that each participant is assigned to (and subsequent analyses can look at all 3 studies within-participant).”
=> It looks like--though this is not entirely clear from their response--that the authors agree to carry out analyses on only the first set of studies/conditions that participants see. This should be done regardless of what the within-subject analyses find. In fact, as I previously indicated, the between-subjects analyses (on the first scenario subjects see) should really be the *main* analyses that they carry out (i.e., the within-subject analyses should be the additional/bonus ones; not the other way around). Also, it is critical that participants not know, in advance, that they will be completing multiple studies/conditions on the same question/domain. That is, while participants may be told that they will be completing various studies, they should approach the first sunk-cost scenario without expecting to see additional sunk-cost scenarios (i.e., these should come as a surprise). Otherwise, there could still be spillover effects on the first scenario if subjects know they will be asked about similar things in subsequent parts of the survey. Also, every sunk-cost scenario should appear on a separate (web) page, so that participants’ response to the first scenario is completely unaffected by the next one they are presented with. Finally, the authors need to make sure that they increase their sample sizes so that they have sufficient power (even) for the between-subjects analyses that only focus on the first sunk-cost scenario that each subject sees.
Comment 4) “Another concern, which may lead to a failure to replicate the effect, is that experienced MTurk participants may have been exposed (and some repeatedly) to sunk-cost studies, and this may hinder the effect. The authors should therefore consider limiting the study to MTurk participants who have had relatively little experience (e.g., fewer than 100 MTurk studies completed).”
=> The exclusion question/item that the authors propose to use seems far too vague (“Have you ever seen the materials used in this study or similar before?”), especially since it appears at the very end of the survey. The authors should (also) ask more specific recognition questions, along the lines of “Have you participated in any other study/studies that presented a scenario involving a decision to select a project that someone had already invested time or money in? Have you participated in any other study/studies that presented a scenario involving a decision to select an item that someone had already spent time or money to obtain?”
Comment 5) “I don’t understand the distinction that the authors are trying to draw, here. Chi-Square tests also evaluate whether likelihoods vary across conditions, so the authors are mistaken if they suggest otherwise. I suspect they meant something else, but that it did not come across clearly in their writing.”
=> My confusion stemmed from the fact that the authors seemed to (unintentionally) imply that Chi-Square and Logistic Regression modeled different outcome variables. To avoid confusion, the authors can add (to the revised sentence they propose) that they propose testing an interaction, and that this can only be done with Logistic Regression (or, at least, not with a standard Chi-Square test).
I thank the authors for their responses.
Below I detail some residual issues and suggest remedies. Other than those, I am happy for the authors to proceed with their study.
Outstanding issue 1: exclusion criteria. Your response to my and Johanna Peetz's comments clarify things greatly (which was needed - and overviews should be in the final paper even if summarized/footnoted). I now see the practices of the services you use (CloudResearch/Turkprime) help improve data quality, which is of course very welcome (as is your explicit description thereof).
However, I fail to see how that means you should not include explicit and credible criteria within your experiment (i.e., those that allow for how inattention/poor quality may interact with your specific study). The only thing I see that responds to this concern is that you highlight you ask subjects if they will pay attention (yes/no response options), and then ex post whether they were serious (Likert scale) and understood what they read (Likert scale). Asking someone if they intend to pay attention or understood is not the same as checking they did pay attention or did understand, and is not convincing at all.
It would be convincing to include simple and explicit criteria based on hard data such as: a minimum time spent on the experiment's crucial pages (e.g., excluding those who spend less than X seconds on a page); or explicit attention checks (questions subjects know the answer to only if they paid attention - these should be standalone questions aiming to assess attention only, not those that are also part of the experiment, you could also mention at the start that they should pay attention because you will ask comprehension questions).
If your data are guaranteed to be of high quality as you argue strongly in your responses, then they will not bite: your subjects will all pass such criteria and none will be excluded. But your work would be stronger and your results more trustworthy.
Outstanding issue 2: Please state explicitly (instead of requiring the reader to go and open your Qualtrics project file): can subjects go back to previous pages of the experiment? In your case it would seem important that they cannot go back (including for explicit attention checks).
DOI or URL of the report: https://osf.io/wu6tm/
Revised manuscript: https://osf.io/9p3mj
All revised materials uploaded to: https://osf.io/pm264/ , updated manuscript under sub-directory "PCIRR Stage 1\PCI-RR submission following R&R"
This list is not comprehensive, so please be sure to respond point-by-point to every issue raised in the reviews.
After a careful review of this paper along with the provided materials on OSF, I find myself sceptical about the value of the proposed research. The scientific validity of the research question is unclear, and the methods (as proposed) would in my opinion not provide meaningful conclusions due to questionable data quality of the proposed sample and confounds and divergence from the original study in study design. I outline my concerns in more detail below.
Literature review
1. The literature review or background section of this paper is extremely sparse and is insufficient in outlining the reasons for the research. The provided minimal arguments for this replication project are a) the impact of the original paper and b) the fact that it has not been (directly) replicated yet. There are lots of similarly influential papers that have not yet been replicated – so a more fulsome explanation of ‘why this one’ seems necessary.
Put differently, what exactly is the scientific value of this replication? Perhaps there is reason to doubt the original effect and a direct replication would allow for falsification of an established assumption. Perhaps identifying the exact effect size of the sunk cost versus sunk time effect would be helpful to other researchers. Perhaps identifying the boundaries of the effect would be helpful. Perhaps showing that the effect can be generalized to online samples in the US more than two decades later could be helpful. As it is, the introduction specifies no concrete scientific question and does not define precise hypotheses either.
Replications do not have to be novel (by definition!) but they do have to provide a justification for replicating a specific research. Such a justification is lacking in the present paper.
2. While there may not be direct replications, there have been indirect replications, which this section fails to mention – such as papers replicating the sunk cost effect for money but not time in different decision contexts (Pandey & Sharma, 2019) and some that actually did show a sunk time effect in yet other contexts (Navarro & Fantino, 2009; Castillo, Plazola, Ceja, & Rosas, 2020). These should be reviewed given that they likely reflect on the research question (once one is identified). For example, if the main purpose is to establish concrete effect sizes, past indirect research might help with this just as much as this one-time high-powered Mturk direct replication – in fact, there is likely enough work out there on these questions to complete a meta-analytic review already.
In sum, there is a rich literature on sunk cost effects both on time and money that developed over the past two decades since Soman’s (2001) studies and the present paper should clearly situate itself within this literature.
Sample
1. The method section assumed a 5% exclusion rate. It appears that this is based purely on the randomly generated data set used to populate tables etc. It would make a lot more sense to base the estimated exclusion rate on known exclusion rates for online crowdsourced samples. For example, in a very recent study on inattentive Mturk responders 13% were inattentive even after a number of ex ante data quality checks were in place (Pyo & Maxfield, 2021). Additionally, the number of ex ante data quality checks in this study seems underdeveloped – there are a number of restrictions in Mturk and Prolific that can be employed to ensure a lower chance of bots that are not mentioned here (e.g., 95% hit approval).
2. Many resources are available outlining best practices of attention checks in MTurk (Berinsky, Margolis, & Sances, 2014; Pyo & Maxfield, 2021; Thomas & Clifford, 2017) and the current data quality checks do not follow these recommendations as far as I can tell.
The switch from real in-person surveys to online surveys comes with a chance of high inattention or even ‘bots’ producing random noise. In a good faith replication (especially one where original in-person collection is changed to online sample pools with notorious attention problems), the steps taken to make sure that participants are real and are actually reading the questions are extremely important. The present data collection plan would not make me feel confident that any potential null effect is not actually just due to poor quality, inattentive participants.
3. The current data collection plan also does not outline whether excluded participants will be replaced and whether the power calculation refers to the final sample or simply to the number of slots posted on Mturk. Further, is there a planned point of percentage of discarded data at which the study would be deemed failed? Would you consider data even if, say, 30% of respondents have to be excluded?
Design
1. The planned design of running all three studies on the same sample of participants is not true to the original study designs. Since participants are being paid according to the time it takes to do the survey, it would require the same resources to run these three studies separately, so the reason for this divergence from the original study is unclear. Note that no reason is given for this considerable change from the original procedure - not even in the table outlining original vs replication methodological comparison that includes a ‘reason for change’ column (Appendix B).
Running separate samples instead would avoid the possibility of undetected bots affecting all three studies rather than just one and would address the possibility of carry over effects. For example, if participants thought about and responded in line with a sunk time effect in an earlier question, they might then respond consistent with earlier responses in Study 5 and would be less likely to be swayed by the education condition. Giving Study 5 always after Study 1 or 2 stacks the deck against replicating the Study 5 effect because of these consistency biases in responding.
Even if all studies were administered in the proposed way (Study 1 and 2 counterbalanced, then Study 5), authors should check for order effects of the counterbalancing in all analyses. In the current proposed analyses, potential order effects are not tested.
2. I disagree with the authors’ characterization of the IV materials in S3 as being the “same” as materials in the original study (according to Appendix B, Table on replication classification). In the original study, students were listening in person to a university lecture on opportunity cost. This information is coming from a source they take seriously (a professor at the university they attend) and is of considerable length and depth. This is not in any way the ‘same’ as reading 266 words about opportunity cost in an online experiment. Even if they show comprehension of this information on 2 followup questions, the information is not likely to be processed anywhere in near the same depth.
3. As a followup on point 3, authors do not even plan to exclude people who did not answer one or both of the comprehension questions correctly (this is neither mentioned explicitly nor reflected in the simulated data). So, someone who skims the brief paragraph on opportunity cost and answers at random would still be included as bona-fide participant. In my opinion, this proposed experiment could not conclude whether opportunity cost education has a moderating effect on the sunk time cost effect because there is no evidence that participants read the or processed the information. Of course, the original experiment cannot be sure participants processed the information either, but in that experiment, participants were present for an hour+ long lecture – the likelihood that something ‘sticks’ is much higher than for 266 words in an online survey. I’m not saying that a meaningful replication of Study 5 cannot be done online but the proposed way of doing it is in no way equivalent to the original study.
4. A minor inconsistency. In the method description, the central outcome scale is labelled 1-9 (in line with the original scale). However, in the appendix (as well as in the OSF survey) the scale anchors are changed and the scale is now labelled 4 – 0 – 4 (not even -4 to +4). I would recommend the authors stay true to the original materials (as they said they did) in all aspects of the materials.
In sum, while I was initially excited to read about the authors plans to replicate this seminal study, I do not believe that the paper and planned study as is will make a meaningful contribution to a well-developed field. However, I could see this project becoming a worthwhile contribution if a specific research question is identified, a more fulsome literature review is included, and several design changes are made. I do want to recommend the authors for their transparency in terms of materials and the helpful appendices. I might have disagreed with their estimates of how similar the proposed replication is to the original study but the tables summarizing the authors’ opinion on this were very clearly outlined. The proposed analyses were also clear and well-written.
Signed,
Johanna Peetz
I am happy to see replications of classic effects, and the sunk-cost effect (for time) is no exception. Therefore, I commend the authors for carrying this out. That said, I do have some comments and concerns about the current plan and/or manuscript:
1) First off, the current manuscript is poorly written. In particular, I see a lot of grammar errors that could (and should) have been checked and corrected (e.g., using grammar checks in Word). I sincerely hope the authors will make an effort to proof-read their word before they submit it for review.
2) The authors should cite and discuss other papers (besides Soman, 2001) that have previously tested (and found) sunk-cost effects for time (e.g., Bornstein & Chapman, 1995; Bornstein, Emler, & Chapman, 1999; Frisch, 1993; Navarro & Fantino, 2009; Olivola, 2018; Strough et al., 2008). This is important, since these other papers also speak to whether (and to what extent) there is a sunk-cost for time. In fact, I would suggest the authors provide a table that summarizes these other papers and, for each one, what they found (e.g., whether they found a sunk-cost effect of time).
3) Having the same participants complete all 3 studies in a single session is problematic, as it may cause spillover effects, amplify demand effects, etc. The authors should consider randomly assigning participants to one of the 3 studies (not all 3). Or, at the very least, the main analyses should only focus on the first study that each participant is assigned to (and subsequent analyses can look at all 3 studies within-participant).
4) Another concern, which may lead to a failure to replicate the effect, is that experienced MTurk participants may have been exposed (and some repeatedly) to sunk-cost studies, and this may hinder the effect. The authors should therefore consider limiting the study to MTurk participants who have had relatively little experience (e.g., fewer than 100 MTurk studies completed).
5) On p. 20, the authors write: “In order to address H1, Soman (2001) conducted multiple chi-square tests. Specifically, in Study 2, he showed that in the money condition, the chi-square test found difference between sunk cost and no sunk cost conditions, whereas the same difference was not found for the time condition. A different way to approach H1 is to ask whether the likelihood of picking the option associated with sunk costs (theater performance in Study 1 and rocket engine in Study 2) is different across conditions. To address this question, we conducted a logistic regression analysis for Studies 1 and 2 for both the original and the replication data.”
=> I don’t understand the distinction that the authors are trying to draw, here. Chi-Square tests also evaluate whether likelihoods vary across conditions, so the authors are mistaken if they suggest otherwise. I suspect they meant something else, but that it did not come across clearly in their writing.
REFERENCES:
Bornstein, B. H., & Chapman, G. B. (1995). Learning les- sons from sunk costs. Journal of Experimental Psychology: Applied, 1, 251–269.
Bornstein, B. H., Emler, A. C., & Chapman, G. B. (1999). Rationality in medical treatment decisions: Is there a sunk-cost effect? Social Science & Medicine, 49, 215–222.
Frisch, D. (1993). Reasons for framing effects. Organizational Behavior and Human Decision Processes, 54, 399–429.
Navarro, A. D., & Fantino, E. (2009). The sunk-time effect: An exploration. Journal of Behavioral Decision Making, 22(3), 252–270.
Olivola, C. Y. (2018). The interpersonal sunk-cost effect. Psychological Science, 29(7), 1072–1083.
Strough, J., Mehta, C. M., McFall, J. P., & Schuller, K. L. (2008). Are older adults less subject to the sunk-cost fallacy than younger adults? Psychological Science, 19, 650–652.
1A. The scientific validity of the research question(s).
The authors aim to replicate 3 studies from Soman (2001).
The sunk cost effect is important. Distinctions between different types of sunk costs are important. Replicating studies related to that, e.g., those of Soman (2001), are therefore worthwhile contributions.
**
1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable.
This is a replication paper; rationale and plausibility are clear. Below I suggest some wording changes from those presented in Table 1.
Hypothesis 1: "More generally" does not make sense - domain is not a generalization of size... Perhaps splitting it into separate hypotheses would make more sense.
Hypothesis 2b: I would not write "Rational" - you expect to be dealing with subjects who exhibit the sunk cost effect, at least when money is sunk (making their choices inconsistent with some textbook "rational" actor).
**
1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).
Order effects. Did Soman randomize the order of studies 1 and 2, as you will? If yes, did they find an effect? Please test for an order effect.
Please report the average completion time and the lump sum offered for completion, not only the goal of 7.25/hour. Also, 7.25 is the federal minimum, but it differs by state (https://en.wikipedia.org/wiki/List_of_US_states_by_minimum_wage). I imagine it will affect your sample demographics, e.g., education level or employment experience, which could correlate with your outcome measures of interest. If you can, I would pay more. If you cannot, it would at least seem worth discussing.
Was Soman's sample undergraduate students? MTurkers are a different crowd and their different demographics may be a driver of the results you find. You cannot do a detailed comparison, as you report Soman did not disclose detailed demographic information, but there are systematic differences between college students and Mturkers e.g., age, experience, incentives, etc. I encourage a discussion of this potential source of differences, and analysis of how your subjects' demographics are associated to the treatments (see next points).
Logistic regression. You already conducted Chi-squared tests for Studies 1 and 2 and you have predicted proportions from the raw data. Why run logistic regressions with only the treatment (dummy?) variables on the RHS? The value I see in regression analysis would be to see if there were some interesting covariates of sunk cost behavior not picked up in Soman, that may explain your data e.g., subject demographics. Please revise or justify why the regressions you propose add value.
Can you explicitly confirm whether the within-subject (design and) analysis was also done by Soman? (I guess it was not - but if yes, a comparison is needed)
Overall, except potentially the last point above, this is a pure replication paper. That is of course a great goal in itself. But do you want to explain your and/or Soman's results? If nothing else than by discussing how they may vary with demographics or other variables? I did not see any analysis aimed at this, yet it would seem straightforward to add (e.g., via multivariate regression - see point above) and potentially enlightening.
**
1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.
Detail seems good and a full transcript is provided.
Please list the details of the Qualtrics implementation somewhere, e.g., availability of a "back" button, time limits, forced responses, etc. The idea being that someone could fully replicate your work with all the same options selected in the software.
Please make your data and analysis code available ex post.
**
1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).
The exclusion criteria seem good, but I think some other good ones are not listed. First, a criterion based on how quickly the main pages were submitted. Too fast means it was infeasible that they read the text. Second, on a page following the main text (and assuming there is no ability to go back) it would be good to ask questions to check subjects paid attention (questions they would only know the answer to if they read the text). (You ask some questions in Study 5 which require subjects to understand some conceptual info, but those are part of the education treatment.)
Related, you expect only 5% of their sample to be excluded by their criteria (p11). My sense is that that may be conservative. I believe the present study would be significantly boosted by stronger exclusion criteria and a correspondingly larger initial sample.
**
Other comments:
I recommend the authors check the following references, especially the last one which looks to some extent at the distinction between time and money in a sunk cost context.
Augenblick, 2016, "The sunk-cost fallacy in penny auctions"
Olivola, 2018, “The interpersonal sunk-cost effect”
Ronayne, Sgroi, and Tuckwell, 2021, "Evaluating the Sunk Cost Effect"
I believe it would be better to qualify the introductory definition of the sunk cost effect as relating to *irreversible* or *unrecoverable* investments of resources, e.g., line 1 of Abstract.
p7 "found that the sunk cost effect was ... not [present] for time": you cannot prove a null. Please re-phrase e.g., "no evidence for an effect of..."
Two examples of unclear writing below from early in the manuscript. I shall refrain from further comments about the writing, but recommend you have the final manuscript proofread.
a. I do not understand the phrase at the end of the first sentence of the first paragraph of the Introduction "given that with larger sunk costs are stronger tendencies to further escalate". I would avoid the speculation over the ("vicious cycle of") consequences of the SCE, and just talk about the SCE itself, except for the discussion.
b. p1 "yet evidence is sometimes inconsistent with weak effects" does not read well. There are at least two different possible meanings.
c. p7 "appeared" rather than "re-appeared"
I am not sure that 420 citations in 21 years is a huge amount. Also, some people think Google Scholar is a poor citations counter. I think it distracting and unnecessary and would remove it.
Typos I spotted:
Authorship declaration: "is" in line 1.
p10. "have possible detected" should be "possibly".
p15. "we found was", remove "was".
p22. You write "no support for a main effect of sunk type" and then an effect with significance p=.001...
p22. "Soman found a main effect of sunk presence"
1A. The scientific validity of the research question(s)
-The study seeks to replicate the hypothetical scenarios used in the experiments 1,2 and 5. I am not sure if these are the experiments that the resources should be focused on. The study seeks to replicate a sunk cost effect for time and money DECISIONS – the replication proposed now only seeks to replicate the effects for INTENTIONS. Here is a severe mismatch. Soman (2001) used experiment 6 to validate his previous findings, for this reason, experiment 6 seems to be the most crucial experiment for his argument and not study 1,2, and 5. As he states: “Experiment 6 involved real choices made by individuals who had made real investments of time. The results validated Hypotheses 1 and 2a, namely that the sunk-cost effect was not detected in the domain of temporal investments, but it reappeared when the accounting of time was facilitated.”
1B. The logic, rationale, and plausibility of the proposed hypotheses (where a submission proposes hypotheses)
- The submission does not propose new hypothesis.
1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable)
- I appreciate the detail and care the authors have taken in simulating the data and showing the results in an adequate statistical framework (logistic regression).
- The analysis of the preference ratings should be done with a cumulative logit or probit regression and not ANOVA as the measure is not truly continuous -see:
Liddell, T. M., & Kruschke, J. K. (2018). Analyzing ordinal data with metric models: What could possibly go wrong? Journal of Experimental Social Psychology, 79, 328–348. https://doi.org/10.1016/j.jesp.2018.08.009
1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses
-sufficient
1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).
- as stated in 1a, I think that to test the hypotheses proposed by Soman (2001) and replicate his findings, a vignette study does not suffice to replicate the evidence which supports the proposed hypothesis.