Looking (again) at Medusa: does pictorial abstraction influence mind perception?

ORCID_LOGO based on reviews by Alan Kingstone, Brittany Cassidy and 3 anonymous reviewers
A recommendation of:

The Medusa effect: A registered replication report of Will, Merritt, Jenkins, and Kingstone (2021)

Submission: posted 18 August 2022
Recommendation: posted 09 February 2023, validated 09 February 2023
Cite this recommendation as:
Chambers, C. (2023) Looking (again) at Medusa: does pictorial abstraction influence mind perception?. Peer Community in Registered Reports, .

Related stage 2 preprints:


The Medusa effect is a recently described phenomenon in which people judge a person to be more mindful when they appear as a picture than as a picture within a picture. Across a series of experiments, Will et al. (2021) reported that at higher levels of abstraction, images of people were judged lower in realness (how real the person seemed), experience (the ability to feel) and agency (the ability to plan and act), and also benefited less from prosocial behaviour. The findings provide an intriguing window into mind perception – the extent to which we attribute minds and mental capacities to others.
In the current study, Han et al. (2023) propose a close replication of two experiments from the original report by Will et al. (2021), asking first, whether the level of pictorial abstraction influences ratings of realness, agency and experience, and second, whether it also influences prosocial behaviour as measured in the dictator game (with participants predicted to allocate more money to recipients presented as pictures than as pictures within pictures). In the event of a non-replication using the original materials, the authors will further repeat the experiments using newly generated stimuli that are better matched for cultural context and more tightly controlled along various dimensions.
The Stage 1 manuscript was evaluated over two rounds of in-depth review. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol:
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA. 
List of eligible PCI RR-friendly journals:
1. Will, P., Merritt, E., Jenkins, R., & Kingstone, A. (2021). The Medusa effect reveals levels of mind perception in pictures. Proceedings of the National Academy of Sciences, 118(32), e2106640118.
2. Han, J., Zhang, M., Liu, J., Song, Y. & Yamada, Y. (2023).The Medusa effect: A registered replication report of Will, Merritt, Jenkins, and Kingstone (2021), in principle acceptance of Version 2 by Peer Community in Registered Reports.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviewed by anonymous reviewer 1, 04 Feb 2023

I thank the authors for thoroughly addressing our comments, and I believe the changes made in response to my and the other reviewers' comments substantially improved the manuscript -- e.g., in clarifying the motivation for conducting the replication, adding methodological details, and discussing possible outcomes and conclusions.  I have no further questions and am very much looking forward to the results!

Reviewed by , 31 Jan 2023

I am happy with the authors response to the reviews, and am pleased to support the implementation of this study. 

Reviewed by anonymous reviewer 2, 11 Jan 2023

The authors have replied to all my previous comments and I can therefore suggest the approval of this Stage 1 Registered Report.

Reviewed by anonymous reviewer 3, 06 Jan 2023

I now endorse this Stage 1 manuscript.

Evaluation round #1

DOI or URL of the report:

Version of the report: v1

Author's Reply, 03 Jan 2023

Decision by ORCID_LOGO, posted 23 Oct 2022, validated 24 Oct 2022

I have now obtained four very helpful and constructive reviews of your Stage 1 submission. As you will see, the reviews are overall positive and, in my own reading, I found myself agreeing with their general enthusiasm for replicating this intriguing phenomenon. As is often the case with Registered Reports, the reviews highlight a number of areas that would benefit from clarification and possible design amendments, ensuring that the replication is as well motivated and diagnostic as possible about the replicability of the original study. In revising, foremost issues to consider are deviations from the original methodology (which should be minimised as much as possible, and clearly justified where required), strengthening the motivation for the replication (which should be straightforward to achieve), sufficiency of control conditions, addition of key details concerning the procedures and analysis plans, and clearly stating the conditions under which the results would be deemed to constitute a successful replication of the original findings.

On this basis I am happy to invite a thorough revision and response, which I will likely return to a subset of the reviewers for re-evaluation.

Reviewed by , 18 Oct 2022

Reviewed by , 12 Oct 2022

This registered replication report proposes the replication of two studies included in Will et al., 2021 in Japanese participants. The experiments themselves are very straightforward; I have little critique there. I do wonder if the authors should go ahead and complete the conditional experiments—it would be meaningful to know if the Medusa effect replicates across diverse image sets cross-culturally.
Overall, I am enthusiastic about pre-registered replications. I do think, however, that the authors could be stronger about how their proposed work replicates *and* extends the literature. For example, the authors could be much stronger about how it is meaningful for the literature for findings to be replicated across diverse cultures. They touch on it some, but leave their thoughts a bit vague and just describe “generalizability.” But, they are selling themselves short, I think, by downplaying the potential for the Medusa effect as culturally generalizable. This may be important to bring up considering much work showing perceptual and social cognitive cross-cultural differences. Generalizability would be especially cool here.
I also think the authors could be stronger in motivating why *this* particular effect should be replicated. Replication is a good thing altogether, but why *this* effect out of the myriad effects in the literature? What makes this effect especially relevant/important to pay extra attention to?
A bit more motivation could make this interesting replication all the more interesting and impactful.

Reviewed by anonymous reviewer 2, 13 Oct 2022

In this work, the authors want to replicate the ‘Medusa effect’ (Will et al., 2021, PNAS), according to which pictures of people are judged more ‘mindful’ than ‘pictures of pictures of people'. Two experiments (1 and 2) will be conducted, and two additional experiments (3a/3b) are also reported in the case Exps. 1 and 2 will not report the ‘Medusa’ effect. I think this is a nice proposal and I only have a few observations.

My first comment is about the theoretical motivations underlying this work. While, on the one hand, I think that the replication of a phenomenon (especially when it is ‘new’, as in this case) worth a try, on the other hand, I am also wondering if the authors can expand a bit the motivations that guided their decision. Testing the generalizability of the Medusa effect in an Asian country is sure of great interest since we know that different social groups (e.g., westerners and Easterners) tend to elaborate social stimuli and social scenes differently (see, e.g., Masuda, 2017). For instance, we know that Westerners tend to be less influenced by concurrent social stimuli that are presented in the scene and that are task-irrelevant, while Easterners would tend to elaborate social scenarios mole globally, likely reflecting the collectivist culture (vs. the more individualistic culture of Western countries) that is typically associated with Asian countries. So, I am wondering if the same rationale can be also applied here (e.g., can culture and the different strategies of visual explorations of social scenes expressed by Westerners and Easterners, influence the ratings associated with L1 or L2 levels? This is just a speculative interpretation, but I would be happy to hear some comments).

My second comment (related to the previous one) is about the possibility of not observing the ‘Medusa’ effect in Exps. 1 and 2. The authors identified two main possibilities: 1) the Medusa effect does not exist, or it exists only under very limited conditions (pages 15-16). 2) The original stimuli were not adequate to detect the effect. In both cases, I would ask the authors to provide clearer explanations; in particular, for 1) please clarify what you mean by ‘very limited conditions'. As for 2) I do agree that ethnic membership of original stimuli (White) and the participants that will be recruited in Exps. 3a/3b could be confounding, as we know that social perception is deeply shaped by the ethnicity of both the stimulus and the observer. Nevertheless, also for 2) please clarify your rationale by also providing some references.

Third, when evaluating the ‘mental states’ associated with others, a key role is played by eye-gaze direction. This is also reported in your introduction when discussing differences between direct and averted eye-gaze stimuli. So, I think a few more words could be dedicated to explaining the stimuli you are going to use; in particular, is the eye-gaze direction manipulated? Or it remains constant? I guess all faces will be presented with a direct gaze, but this should be clarified.

My final (minor) comment is about the sample: you stated that 19 to 99 years old people could participate in your Studies; Giving the huge differences in ‘mentalization’ and ‘mind perception’ characterizing young and old adults (see, e.g., Henry et al., 2013), I wonder if samples with a narrower age range would be preferable, even if looking at the original work by Will et al. (Exps. 2 and 5) I did not find any information about the age of the participants.

Reviewed by anonymous reviewer 1, 13 Oct 2022

This proposal sets out to replicate a recent study on "the Medusa effect", wherein people are perceived as more real, mindful, and agentic, when they were presented as pictures compared to as pictures of pictures (Will et al., 2021, PNAS). In particular, the authors will use the same stimuli and procedures as the original study and replicate two experiments: one testing ratings of realness, agency and experience (Study 1, replicating Exp. 2); and the other testing donations in a dictator game (Study 2, replicating Exp. 5). Should those fail, they plan to conduct Studies 3a-b with new stimuli. While I think the Medusa effect is interesting, I think many aspects of the proposal need to be clarified — including the motivation, important details of the design, and plans for null hypothesis testing. I list these more in detail below, in hope they will be helpful to the authors.

The motivation for conducting the present study is simply that the Medusa effect has never been replicated before (p. 6). While I am in perfect agreement with the authors that the Medusa effect is interesting and important, this rationale of course applies to many (if not most) effects that are just as interesting and important. And actually, the original study already contains multiple direct and indirect demonstrations of the effect, and follows open data practices.

More generally, this is not a direct replication testing the reproducibility of the Medusa effect, because it crucially involves a different sample/culture. This should be highlighted and motivated as a key difference: is there past work suggesting cultural differences in mind perception or abstraction perception? Do the authors have reason to believe this will make no difference?

I had several comments on the discussion of possible outcomes:
- Will the authors take their results to support H1 and H2 only if all three DVs show an effect? What if only two do? Or one?
- The authors only mention the possibility of failure to replicate the original effect ("If neither H1 nor H2 are supported, the reproducibility of the Medusa effect may be problematic.") but do not discuss it further. This might instead be a good occasion to mention differences between the original and current study (e.g., different sample, different recruitment platform, ...).
- It is unclear why limitations of the stimuli may explain disconfirmation of H1 and confirmation of H2. Why would the stimuli matter for H1, but not H2? And why would they explain a disconfirmation, if they are the same as in the original study?
- Another reason why H2 but not H1 may be confirmed is that the Medusa effect has stronger consequences for implicit behavior, while explicit judgments are more variable; and viceversa, H1 but not H2 may be confirmed if the Medusa effect has stronger consequences for explicit (vs. implicit) behavior.

While I empathize with the authors' preference for not conducting studies in the laboratory, I don't think this should be a determining factor in choosing which experiments to conduct (p. 7: "Since the COVID-19 pandemic is still in the process, we chose not to replicate them."), since of course this is an ongoing, constantly evolving situation. And beyond the online/inlab situation, I think the types of controls employed in Experiment 4 are important, and the authors should consider adopting those as well.

For all studies, the authors need to specify a plan in case the main analyses aren't significant, to determine whether the results are null or inconclusive.

Minor points

Study 1:
- The authors mention that "Similar to Will et al.'s (2021) study, pictorial abstraction is a between-subjects factor."; But abstraction is NOT a between-subjects factor in the original study ("Their task was to rate each of the two people shown in an image", p. 6). This should absolutely be corrected.
- Instead, the original study varied the DV (Realness, Agency, and Experience) between subjects. This should absolutely be implemented.
- Were the instructions directly translated from the original ones?
- Will definitions also be provided as in the original study?
- This sentence confused me: "there will be no strict time limitation so that the participants can [...] take no longer than 5 minutes"
- The authors say they will recruit 564 participants, but the table then mentions "more than 564".
- The table mentions "a paired t-test", but aren't the authors conducting three?
- Again, I am not sure how the quality of the stimuli could explain failed replication of H1, if those are the same stimuli as in the original study.

Study 2:
- What is the attention check?
- Why has the maximum donation amount ($10) been lowered (to 1000 yen) when $10 = ~1500 yen?
- Will et al. also had a rating phase in their Exp. 5; why is this omitted here?

- p. 3: The authors define the Medusa effect as a tendency for people to 'evaluate a “picture of a person” as more mindful than a “picture of a picture of a person”', but peple aren't rating the mindfulness of pictures (L1), they are rating the mindfulness of people (L0) in those pictures. This should of course be clarified.

I found the writing often unclear, and I worry a naive reader unfamiliar with the Medusa effect might have a hard time following. I won't list all of them here, but I will just take the first few sentences from the abstract to exemplify:
- The very first sentence "Pictures play an important role in containing and expressing information related to the human mind" is puzzling since many pictures are unrelated to the human mind (e.g. landscapes); do the authors mean pictures of people/faces?
- The second sentence was also confusing "compositional differences [...] affect the way we perceive the vast amount of information" since it's unclear what the vast amount of information refers to; do the authors mean the way we perceive people?
- The third sentence contains an incorrect definition of the Medusa effect, as per my point above.
- The fourth sentence also confused me, since realness was never mentioned before (and is different from mindfulness; a rock can be real even though it doesn't have a mind); also, it's unclear what 'dimensions' refers to; do the authors mean 'abstraction' or 'compositionality' instead?

- p. 5: "Following the aforementioned prior study, Will et al. (2021) used five experiments " I am not sure what the 'prior study' refers to.

- p. 5: I don't think eyetracking is considered physiological data?

User comments

No user comments yet