Gaining confidence in Michotte’s classic studies on the perception of causality

ORCID_LOGO based on reviews by Maxine Sherman and 1 anonymous reviewer
A recommendation of:

Michotte's research on perceptual impressions of causality: a pre-registered replication study


Submission: posted 03 May 2023
Recommendation: posted 21 October 2023, validated 22 October 2023
Cite this recommendation as:
Syed, M. (2023) Gaining confidence in Michotte’s classic studies on the perception of causality . Peer Community in Registered Reports, .


​​​Making causal judgements are part of everyday life, whether seeking to understand the action of complex humans or the relations between inanimate objects in our environments. Albert Michotte’s (1963) classic book, The perception of causality, contained an extensive report of experiments demonstrating not only that observers perceive causality of inanimate shapes, but do so in manifold ways, creating different “causal impressions.” This work has been highly influential across psychology and neuroscience.
In the current study, White (2023) proposes a series of experiments to replicate and extend Michotte’s work. Despite the fact that this research is foundational to current work on perception and understanding of causal relations, it has never been subjected to rigorous replication. Moreover, like many research studies from that era, Michotte was sparse on details about methodology and did not rely on statistical analysis. White has proposed an ambitious set of 14 experiments that directly replicate and, in some cases, extend Michotte’s experiments. 
The Stage 1 manuscript was evaluated over three rounds of in-depth review, the first two rounds consisting of detailed comments from two reviewers and the third round consisting of a close read by the recommender. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and was therefore awarded in-principle acceptance (IPA). 
URL to the preregistered Stage 1 protocol: (under temporary private embargo)
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI-RR-friendly journals:
1. Michotte, A. (1963). The perception of causality (T. R. Miles & E. Miles, trans.). London: Methuen. (English translation of Michotte, 1954).
2. White, P. A. (2023). Michotte's research on perceptual impressions of causality: A registered replication study. In principle acceptance of Version 3 by Peer Community in Registered Reports. 
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Evaluation round #2

DOI or URL of the report:

Version of the report: 2

Author's Reply, 12 Oct 2023

Decision by ORCID_LOGO, posted 03 Oct 2023, validated 04 Oct 2023

​​​​​​​​​​October 3, 2023

Dear Authors,

Thank you for submitting your revised Stage 1 manuscript, “Michotte's research on perceptual impressions of causality: a pre-registered replication study” to PCI RR.

I received additional evaluations from the two reviewers who previously commented on your submission, and I also reviewed the proposal carefully myself. We all believe that the revised version is much stronger and clearer, with only a few small issues remaining—see reviewer comments for details.

I will handle the revised version myself rather than sending it back to the reviewers, and I will do so as quickly as possible upon submission. My expectation is that I will be able to issue an in principle acceptance at that time so that you can get started with your research.

Thank you for submitting your work to PCI RR, and I look forward to receiving your revised manuscript.


Moin Syed

PCI RR Recommender

Reviewed by ORCID_LOGO, 02 Oct 2023

Thank you to the author for the thorough revisions. Everything is much clearer now – I’ve responded to the response document point-by-point and I’ve only got a couple of minor comments. 


[1] I appreciate the author's point about keeping the method details in the intro - that's fine


[2] Perhaps I've misunderstood the response but I'm not sure how trial-wise reports wouldn't affect statistical comparisons between the means, given that the means are calculated over those trial-wise reports. If the author is particularly unlucky and recruits many particularly unmotivated participants who don’t complete the task properly (e.g. they give mutually exclusive responses such as giving high ratings to both items in Exp. 5), then surely the data from those participants would not be able to address the research questions. However, I sympathise with the point about not wanting to arbitrarily remove data – one alternative could be to present some visualisation of the trial data. For example, a scatterplot depicting the relationship between responses on scale A vs scale B. There would be no need to do any inferential statistics - it would just be an illustration of data quality, and show the reader (hopefully) that the responses were mostly sensible.

Following from this, a minor point: the new text on p.15 L369 would need to be committal for an RR, so e.g. "Data from all participants will be included unless there has been technological failure in data recording" rather than "It is anticipated that [...]"


[3] Ah I see, I had overlooked that participants only complete one trial per condition. Apologies - my mistake.


[4] Ok, thanks


[5] Thanks for the helpful addition


[6, Experiment 1]: Sorry, I may not have been clear in my question about the Experiment 1 analysis. I meant to ask what pattern would need to be seen for it to be classed as a transition – would higher ratings on the launching than passing scale be for any width other than the narrowest be sufficient? 

On my understanding, evidence in favour of the hypothesis would be a significant negative linear trend with width on passing ratings and a significant positive linear trend with width on launching ratings (so, wider implies launching). Then, no transition is tested for statistically – there’s just the prediction of an opposite effect of width on passing vs launching. To then assert the more specific point that participants actually report a passing impression for narrow objects, there’d need to be a follow-up one sample t-test comparing launching ratings against 5 above which the report indicates a passing impression (some of the text in Table 4 has been cut off so I can’t read it fully, but I think this bit might be in there already).


[6, Experiment 2]: Ok I understand – thank you, there’s no need to go into further depth. It sounds as though you won’t be interpreting the results, but simply discussing them speculatively, which is entirely reasonable of course. Perhaps the phrasing in the table could be changed to something like “unexpected interactions will not be interpreted, but will be discussed in the Discussion” might make the point clear?

Re t-tests/LMMs, Ok thank you for the explanation – it’s clearer to me now. I’d suggested t-tests because on my view 1-way ANOVAs on two levels may confuse the reader (ie they may not understand the rationale). I’m happy to leave it to the author here: could they either be changed to t-tests, or could a brief note be added that you’ve selected ANOVAs for consistency with the other tests?


[6, Experiment 3]: Yes that should be fine.


[6, Experiment 4+]: Thanks


[6, Experiment  7+]: Sorry for not being clear - assuming I understood the analysis plan properly, the reason why the ANOVAs wouldn’t test the hypothesis is because ANOVA can only tell you about condition differences. It can’t tell you whether the values exceed the midpoint of the scale – for that you’d need the one-sample t-tests. 


[7] Ok, that’s understandable. Could this be explained in the Methods?


[8] Thanks


[9] I understand - "deliberation" sounds reasonable to me, thanks. 


[10] Oh, I apologise, that’s my misunderstanding. 

Reviewed by anonymous reviewer 1, 31 Jul 2023

I am happy with the revised version of this manuscript.

I realized that my first comment was misleading: I just wanted to say that the introduction could be "split" into shorter sub-paragraphs, to help the reading of a relatively long part (I didn't want to say to "add" new paragraphs/sentences, etc.). I am sorry for this misunderstanding. I am sure that the author could consider this suggestion in a new version of the manuscript.

Evaluation round #1

DOI or URL of the report:

Version of the report: 1

Author's Reply, 28 Jul 2023

Decision by ORCID_LOGO, posted 17 Jul 2023, validated 17 Jul 2023

Thank you for submitting your Stage 1 manuscript, “Michotte's research on perceptual impressions of causality: a pre-registered replication study” to PCI RR.

The reviewers and I were all in agreement that you are pursuing an important project, but that the Stage 1 manuscript would benefit from some revisions. Accordingly, I am asking that you revise and resubmit your Stage 1 proposal for further evaluation.

I agree with the reviewers that the proposed project has high scientific value, was well-motivated, and the methodology was described in admiral detail. There are two main issues that will require greater attention, both highlighted by the reviewers.

First, I agree with Reviewer 1 that there needs to be greater clarity and specificity on how the hypotheses will be tested. The reviewer provides many specific examples.

Second, both reviewers commented on the need to refine the power analysis. You should determine the smallest effect size of interest, and then power each analysis to detect it. This section should be as detailed as possible.

When submitting a revision, please provide a cover letter detailing how you have addressed the reviewers’ points.

Thank you for submitting your work to PCI RR, and I look forward to receiving your revised manuscript.

Moin Syed

PCI RR Recommender

Reviewed by ORCID_LOGO, 30 May 2023

Michotte’s studies are a great target for replication and I read the manuscript with interest. Though I think this will be a very important piece of work given the influence of Michotte’s research, I found that keeping track of 14 studies (some with multiple manipulations) quite burdensome. Once the results are in, the Stage 2 manuscript will be considerably longer and I think attempting replication of this many effects may too much for one paper unless it’s possible to make the Introduction and Methods considerably more concise. Perhaps some of Michotte's methodological details could be relegated to supplementary materials, and key information placed into a table/figure?


The methodology is described in considerable detail, however the analysis plan isn’t sufficiently described to fully restrict researcher degrees of freedom. For example:

i. What data quality checks will there be, if any? For example, if a participant gives a high rating to both “The initially moving rectangle made the other rectangle move by bumping into it” and “The initially moving rectangle passed across the other rectangle, which moved little or not at all” what will happen to the trial and/or participant data?


ii. Apologies if I have missed this, but I couldn't figure out what exactly the will DV be. Will it be mean rating? Table 1 suggests that there will be 1 ANOVA per study, however when there are ratings on multiple statements surely there will be multiple DVs and therefore multiple ANOVAs? What correction for multiple tests will be conducted? Alternatively, if the ratings are to be combined into one DV how will this be done? 


iii. Following from the above, might it be more straightforward in terms of analysis to have participants give a categorial report on their impression (“what was your impression, A, B or C?”) and follow that up with a continuous intensity rating (“please rate the intensity of your impression from 1-10”)? 


iv. Will any sphericity corrections be used? 


v. Given the number of tests that will be conducted, and therefore the high type 1 error rate, how will unexpected interactions (e.g. between speed and width in Experiment 1) be interpreted? 



I also have some concerns about the proposed analyses as described in Table 2, however it is possible I have misunderstood. If I have misunderstood I apologise and could the analyses to be conducted please be clarified in text.

Experiment 1: this suggests the ratings on different statements will be directly compared – “Transition from high passing ratings at low width to high launching ratings at high width would be successful replication”. Could you explain how “transition” will be tested for

Experiment 2: “Significantly higher launching ratings for standard than for camouflage stimuli would be successful replication. All other results would be failure to replicate. Reported effect of fixation were not interpreted by Michotte; interpretation here will depend on results” Could these possible interpretations please be given. 

Also, the text here and in Table 1 suggests to me that the Experiment 2 analyses involve running a 1 way Fixation (yes, no) ANOVA and another 1 way Stimulus (camouflage, standard) ANOVA for each of the 5 camouflage stimuli, i.e. 10 ANOVAS (potentially x 2 for each statement reported on). Is that correct? If so, why ANOVA and not t-tests? To avoid running 10-20 ANOVAs, a linear mixed model that permits different means for the 5 different stimuli would probably be more appropriate. Something like Rating ~ Fixation + Stimulus + (1 | participant ID) + (1 | stimulus ID). 

Experiment 3: “Significant effect of size of either object on launching ratings would discnfirm Michotte's claim. Non-Significant effect would be consistent with it” If the null is being predicted then Bayes Factors would be more appropriate so that you can make inferences about H0

Experiments 4, 5, 6, 13, 14: I’m not sure what the advantage of using Tukey posthocs is here. On my understanding it sounds like the appropriate test here is a single contrast testing for a linear trend

Experiments 7, 8, 9, 10: An ANOVA won’t be able to test this hypothesis. Multiple one-sample t-tests against 5 for each condition may be more appropriate

Experiments 11, 12: The ANOVA seems to be testing for effects of motion and speed (ie those are the factors), but the hypotheses pertain to differences between the scale ratings (launching vs pulling etc) 


Other comments


i. Will each experiment recruit a new (and non-overlapping) sample or will some participants take part in multiple studies? I think it’s important that participants only take part in 1.


ii. P4-5, L 95-102: “The perceptual nature of the launching effect is shown by evidence that it 96 can influence other contemporaneous perceptual processing. […] Detection occurred sooner for launching stimuli than for non-causal controls, supporting the 102 hypothesis that causality is constructed at an early stage of perceptual interpretation.”

I don’t think this is correct – differences in breakthrough time doesn’t mean the effect is perceptual. Participants must decide when to report a percept as present versus absent, and in that sense variations in breakthrough times may well be a function of differences in decision thresholds.    


iii. P11, L275-277: “It is, however, important to the replication study that participants should, as far as possible, report perceptual impressions and not products of post-perceptual processing”

I think the point being made here is clearer later in the manuscript, but on my first reading it sounded to me as though participants were being asked the impossible – to report on some pre-decisional state of perceptual processing. Could this be reworded to make clear the point that particpants are being asked to report what they saw and not what they think following conscious deliberation?


iv. Power analysis: In my view it’s important to conduct a power analysis for each study. Most/all of the hypotheses can be tested with a t-test, so hopefully the calculations should be far more straightforward than powering for the multifactorial ANOVAs.  


v. Table 2 (Rationale column): why does the smallest effect of interest change in each study? Is this the effect size that would be powered for given n = 50? 

Reviewed by anonymous reviewer 1, 05 Jun 2023

In this paper, the author illustrates a series of experiments aimed at replicating (and extending) the ‘classic’ work conducted by Michotte on causality. As reported by the author, despite the great importance and impact of Michotte’s work, a systematic investigation of his experiments with modern and fully reproducible tools is lacking. The main aim of this registered report is to provide a wide range of experiments to fill this gap.


The paper is surely well written and organised; honestly, I have little to comment on at this stage. Several comments are reported below.


The introduction did an excellent job of revising and illustrating the main characteristics of Michotte’s work. However, I have found it to be quite dense, so I am wondering if some additional subparagraphs could help the reader.

The second and more relevant comment is about the sample size. I fully understand that power analyses may be problematic, but at the same time, I also feel that determining sample size on the basis of an arbitrary decision, albeit motivated, can be challenging. So, I am wondering if there is still a possibility to see the power analyses reported in this work. I am aware this request implies great effort, so I'd be happy to adjust to what the editor (or other reviewers) thinks too.

User comments

No user comments yet