I thank the authors for thoroughly addressing our comments, and I believe the changes made in response to my and the other reviewers' comments substantially improved the manuscript -- e.g., in clarifying the motivation for conducting the replication, adding methodological details, and discussing possible outcomes and conclusions. I have no further questions and am very much looking forward to the results!
I am happy with the authors response to the reviews, and am pleased to support the implementation of this study.
The authors have replied to all my previous comments and I can therefore suggest the approval of this Stage 1 Registered Report.
I now endorse this Stage 1 manuscript.
DOI or URL of the report: https://osf.io/mwsjh
Version of the report: v1
I have now obtained four very helpful and constructive reviews of your Stage 1 submission. As you will see, the reviews are overall positive and, in my own reading, I found myself agreeing with their general enthusiasm for replicating this intriguing phenomenon. As is often the case with Registered Reports, the reviews highlight a number of areas that would benefit from clarification and possible design amendments, ensuring that the replication is as well motivated and diagnostic as possible about the replicability of the original study. In revising, foremost issues to consider are deviations from the original methodology (which should be minimised as much as possible, and clearly justified where required), strengthening the motivation for the replication (which should be straightforward to achieve), sufficiency of control conditions, addition of key details concerning the procedures and analysis plans, and clearly stating the conditions under which the results would be deemed to constitute a successful replication of the original findings.
On this basis I am happy to invite a thorough revision and response, which I will likely return to a subset of the reviewers for re-evaluation.
This registered replication report proposes the replication of two studies included in Will et al., 2021 in Japanese participants. The experiments themselves are very straightforward; I have little critique there. I do wonder if the authors should go ahead and complete the conditional experiments—it would be meaningful to know if the Medusa effect replicates across diverse image sets cross-culturally.
Overall, I am enthusiastic about pre-registered replications. I do think, however, that the authors could be stronger about how their proposed work replicates *and* extends the literature. For example, the authors could be much stronger about how it is meaningful for the literature for findings to be replicated across diverse cultures. They touch on it some, but leave their thoughts a bit vague and just describe “generalizability.” But, they are selling themselves short, I think, by downplaying the potential for the Medusa effect as culturally generalizable. This may be important to bring up considering much work showing perceptual and social cognitive cross-cultural differences. Generalizability would be especially cool here.
I also think the authors could be stronger in motivating why *this* particular effect should be replicated. Replication is a good thing altogether, but why *this* effect out of the myriad effects in the literature? What makes this effect especially relevant/important to pay extra attention to?
A bit more motivation could make this interesting replication all the more interesting and impactful.
In this work, the authors want to replicate the ‘Medusa effect’ (Will et al., 2021, PNAS), according to which pictures of people are judged more ‘mindful’ than ‘pictures of pictures of people'. Two experiments (1 and 2) will be conducted, and two additional experiments (3a/3b) are also reported in the case Exps. 1 and 2 will not report the ‘Medusa’ effect. I think this is a nice proposal and I only have a few observations.
My first comment is about the theoretical motivations underlying this work. While, on the one hand, I think that the replication of a phenomenon (especially when it is ‘new’, as in this case) worth a try, on the other hand, I am also wondering if the authors can expand a bit the motivations that guided their decision. Testing the generalizability of the Medusa effect in an Asian country is sure of great interest since we know that different social groups (e.g., westerners and Easterners) tend to elaborate social stimuli and social scenes differently (see, e.g., Masuda, 2017). For instance, we know that Westerners tend to be less influenced by concurrent social stimuli that are presented in the scene and that are task-irrelevant, while Easterners would tend to elaborate social scenarios mole globally, likely reflecting the collectivist culture (vs. the more individualistic culture of Western countries) that is typically associated with Asian countries. So, I am wondering if the same rationale can be also applied here (e.g., can culture and the different strategies of visual explorations of social scenes expressed by Westerners and Easterners, influence the ratings associated with L1 or L2 levels? This is just a speculative interpretation, but I would be happy to hear some comments).
My second comment (related to the previous one) is about the possibility of not observing the ‘Medusa’ effect in Exps. 1 and 2. The authors identified two main possibilities: 1) the Medusa effect does not exist, or it exists only under very limited conditions (pages 15-16). 2) The original stimuli were not adequate to detect the effect. In both cases, I would ask the authors to provide clearer explanations; in particular, for 1) please clarify what you mean by ‘very limited conditions'. As for 2) I do agree that ethnic membership of original stimuli (White) and the participants that will be recruited in Exps. 3a/3b could be confounding, as we know that social perception is deeply shaped by the ethnicity of both the stimulus and the observer. Nevertheless, also for 2) please clarify your rationale by also providing some references.
Third, when evaluating the ‘mental states’ associated with others, a key role is played by eye-gaze direction. This is also reported in your introduction when discussing differences between direct and averted eye-gaze stimuli. So, I think a few more words could be dedicated to explaining the stimuli you are going to use; in particular, is the eye-gaze direction manipulated? Or it remains constant? I guess all faces will be presented with a direct gaze, but this should be clarified.
My final (minor) comment is about the sample: you stated that 19 to 99 years old people could participate in your Studies; Giving the huge differences in ‘mentalization’ and ‘mind perception’ characterizing young and old adults (see, e.g., Henry et al., 2013), I wonder if samples with a narrower age range would be preferable, even if looking at the original work by Will et al. (Exps. 2 and 5) I did not find any information about the age of the participants.
This proposal sets out to replicate a recent study on "the Medusa effect", wherein people are perceived as more real, mindful, and agentic, when they were presented as pictures compared to as pictures of pictures (Will et al., 2021, PNAS). In particular, the authors will use the same stimuli and procedures as the original study and replicate two experiments: one testing ratings of realness, agency and experience (Study 1, replicating Exp. 2); and the other testing donations in a dictator game (Study 2, replicating Exp. 5). Should those fail, they plan to conduct Studies 3a-b with new stimuli. While I think the Medusa effect is interesting, I think many aspects of the proposal need to be clarified — including the motivation, important details of the design, and plans for null hypothesis testing. I list these more in detail below, in hope they will be helpful to the authors.
The motivation for conducting the present study is simply that the Medusa effect has never been replicated before (p. 6). While I am in perfect agreement with the authors that the Medusa effect is interesting and important, this rationale of course applies to many (if not most) effects that are just as interesting and important. And actually, the original study already contains multiple direct and indirect demonstrations of the effect, and follows open data practices.
More generally, this is not a direct replication testing the reproducibility of the Medusa effect, because it crucially involves a different sample/culture. This should be highlighted and motivated as a key difference: is there past work suggesting cultural differences in mind perception or abstraction perception? Do the authors have reason to believe this will make no difference?
I had several comments on the discussion of possible outcomes:
- Will the authors take their results to support H1 and H2 only if all three DVs show an effect? What if only two do? Or one?
- The authors only mention the possibility of failure to replicate the original effect ("If neither H1 nor H2 are supported, the reproducibility of the Medusa effect may be problematic.") but do not discuss it further. This might instead be a good occasion to mention differences between the original and current study (e.g., different sample, different recruitment platform, ...).
- It is unclear why limitations of the stimuli may explain disconfirmation of H1 and confirmation of H2. Why would the stimuli matter for H1, but not H2? And why would they explain a disconfirmation, if they are the same as in the original study?
- Another reason why H2 but not H1 may be confirmed is that the Medusa effect has stronger consequences for implicit behavior, while explicit judgments are more variable; and viceversa, H1 but not H2 may be confirmed if the Medusa effect has stronger consequences for explicit (vs. implicit) behavior.
While I empathize with the authors' preference for not conducting studies in the laboratory, I don't think this should be a determining factor in choosing which experiments to conduct (p. 7: "Since the COVID-19 pandemic is still in the process, we chose not to replicate them."), since of course this is an ongoing, constantly evolving situation. And beyond the online/inlab situation, I think the types of controls employed in Experiment 4 are important, and the authors should consider adopting those as well.
For all studies, the authors need to specify a plan in case the main analyses aren't significant, to determine whether the results are null or inconclusive.
Minor points
Study 1:
- The authors mention that "Similar to Will et al.'s (2021) study, pictorial abstraction is a between-subjects factor."; But abstraction is NOT a between-subjects factor in the original study ("Their task was to rate each of the two people shown in an image", p. 6). This should absolutely be corrected.
- Instead, the original study varied the DV (Realness, Agency, and Experience) between subjects. This should absolutely be implemented.
- Were the instructions directly translated from the original ones?
- Will definitions also be provided as in the original study?
- This sentence confused me: "there will be no strict time limitation so that the participants can [...] take no longer than 5 minutes"
- The authors say they will recruit 564 participants, but the table then mentions "more than 564".
- The table mentions "a paired t-test", but aren't the authors conducting three?
- Again, I am not sure how the quality of the stimuli could explain failed replication of H1, if those are the same stimuli as in the original study.
Study 2:
- What is the attention check?
- Why has the maximum donation amount ($10) been lowered (to 1000 yen) when $10 = ~1500 yen?
- Will et al. also had a rating phase in their Exp. 5; why is this omitted here?
- p. 3: The authors define the Medusa effect as a tendency for people to 'evaluate a “picture of a person” as more mindful than a “picture of a picture of a person”', but peple aren't rating the mindfulness of pictures (L1), they are rating the mindfulness of people (L0) in those pictures. This should of course be clarified.
I found the writing often unclear, and I worry a naive reader unfamiliar with the Medusa effect might have a hard time following. I won't list all of them here, but I will just take the first few sentences from the abstract to exemplify:
- The very first sentence "Pictures play an important role in containing and expressing information related to the human mind" is puzzling since many pictures are unrelated to the human mind (e.g. landscapes); do the authors mean pictures of people/faces?
- The second sentence was also confusing "compositional differences [...] affect the way we perceive the vast amount of information" since it's unclear what the vast amount of information refers to; do the authors mean the way we perceive people?
- The third sentence contains an incorrect definition of the Medusa effect, as per my point above.
- The fourth sentence also confused me, since realness was never mentioned before (and is different from mindfulness; a rock can be real even though it doesn't have a mind); also, it's unclear what 'dimensions' refers to; do the authors mean 'abstraction' or 'compositionality' instead?
- p. 5: "Following the aforementioned prior study, Will et al. (2021) used five experiments " I am not sure what the 'prior study' refers to.
- p. 5: I don't think eyetracking is considered physiological data?