Learning cross-modally to suppress distractors
Do task-irrelevant cross-modal statistical regularities induce distractor suppression in visual search?
Recommendation: posted 26 April 2022, validated 26 April 2022
Related stage 2 preprints:
- Advances in Cognitive Psychology
- Experimental Psychology
- Journal of Cognition
- Peer Community Journal
- Psychology of Consciousness: Theory, Research and Practice
- Royal Society Open Science
- Swiss Psychology Open
Zoltan Dienes (2022) Learning cross-modally to suppress distractors. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=155
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #3
DOI or URL of the report: https://osf.io/th9bc/?view_only=c1bf36677deb46cba762f37d7735c09c
Version of the report: v3
Author's Reply, 26 Apr 2022
Decision by Zoltan Dienes, posted 19 Apr 2022
A small revision still needed.
Let me just recap. For your previous power analysis you used d = 0.45 because it was somewhat smaller than the d = .60 of a previous study. In the last submission, you justify the minimal effect somewhat better by using the lower limit of a CI for a relevant effect from a previous study. For your first hypothesis you take the lower bound of a 60% CI of previous study to get d = 0.42. You explain you used a 60% CI because your practical maximum N is 85. For the second hypothesis you use a 95% CI of a previous study relevant to that effect which gives a d = 0.41, so consistent with your practical maximum N.
So what you have done is retrofitted the heuristic by choosing a % for the CI to fit what you could do practically. That is, the real heuristic that you used was to fit to your practical limit (which is scientifically irrelevant). What you need to do is work the other way round - start from the scientific context, and what you can practically do is either sufficient to address the scientific problem or not. If it is not, you would say up front that a non-significant result would not count against the hypothesis of a scientifically relevant effect. Now the % used for the CI is also arbitrary. But there is no scientific reason on the table for why the % should be different for the diferent problems. Also it is clear that a 60% CI rules out too little in terms of finding the smallest plausible value. I suggest you use a 80% CI for both problems; find the lower limit, and work out your power for both hypotheses with respect to that.
One further point that need not entail any revision to the current mansucript but should be brought up in your discussion if not. Your test of awareness is a forced choice test and does not separate out objective and subejctive thresholds. On two common theories of consciousness (higher order and global workspace) unconscious knowledge would allow above chance performance on your test. On another theory, (recurrent processing) your test does measure conscious processing. (See https://osf.io/mzx6t/ ) Thus, finding that the knowledge was above chance on your awareness test would only indicate conscious knowledge given some but not other theories of consciousness.
Evaluation round #2
DOI or URL of the report: https://osf.io/5qvtg/?view_only=c1bf36677deb46cba762f37d7735c09c
Version of the report: v2
Author's Reply, 14 Apr 2022
Decision by Zoltan Dienes, posted 31 Mar 2022
The reviewers are largely happy with your changes. Vadillo raises a couple of points, one of which I want to highlight here - namely how the figure for an effect size of 0.45 in particular can be justified. I realize in almost all other papers which are not RRs no one really justifies their effect sizes used in power analyses. But we as a rule do for Registered Reports. Thus, while I realize you are already running more subjects than typical, there remains the point that a non-significant result only counts against there being any effect of interest for the H1 in question if the study was well powered for detecting such effects. Thus the power analysis is only as good as the reasons relating the minimally interesting effect size to the scientific problem in question. It is only by addressing this problem that you can justify rejecting your H1. One heursitic in the paper I previously refered you to is to use the lower limit of a confidence interval on the effect from relevant previous studies - if the lower limit is still interesting, then there is a case for that being the smallest effect of interest that is plausible (roughly treating the CI as a credibility interval). Or you may think about it some other way. (The Meyen method for equating direct and indirect task performance that Vadillo refers to assumes equal signal to noise ratio for each trial for the tasks, which is implausible - it makes the same assumption for trials that Vadillo points out shouldn't be made for tasks, so repeats the same issue at another level.)
Reviewed by Miguel Vadillo, 29 Mar 2022
Reviewed by anonymous reviewer, 22 Mar 2022
Evaluation round #1
DOI or URL of the report: https://osf.io/9m35p/?view_only=3b7df2ce241d46118776f15b28c4feb0
Author's Reply, 17 Mar 2022
Decision by Zoltan Dienes, posted 23 Feb 2022
I now have two reviews from experts about your submission. Both reviewers are overall positive, but they make a number of points that will need addressing in a revision. I want to draw your attention to three points in particular based on both my own reading and the reviewers' reactions, though all the reviewers' points need a response:
1) Align your statistical tests with the hypothoses tested. Vadillo asks about your ANOVAs. Note your Design Table does not refer to the ANOVAs, but to particular t-tests. Indeed, a valuable feature of the Registered Report format is you can plan precisely the contrast needed to test each hypothesis in advance. In order to limit inferential flexibility, other tests should typically not be specified. That is you do not need to specify omnibus ANOVA tests in order to justify the particular test of a hypothesis; one just specifies the exact contrast that tests each hypothesis. Further, in order to limit inferential flexibility, you should use just one system of inference: You could do frequentist t-tests or Bayesian ones; but pick one as the one you will do and from which inferences will follow.
2) Power/sensitivity should be specified for each test with justification of the effect size chosen. Thus, if you use frequentist tests, you need to justify a minimally interesting effect size that is scientifically relevant for each test, then determine power for that test, indicating the power for each test. Vadillo asks where d = 0.6 comes from. See here for how to approach the problem of specifying an effect size for power. On the other hand you might decide to use Bayesian t-tests. Then you should justify the rough size of effect expected for each test; see previous reference for this too. The use of default scale factors especially for tests with few trials, like your awareness test, can lead to spurious support for H0 (see here).
3) Vadillo also questions the sensitivity of your test of awareness. This point is related to the previous ones. You need an appropriate sensitivity analysis of every test you conduct - and you also need to list your awareness test in the design table (remove description in the text of tests that you don't list in the design table, in order to keep inferential flexibility under control; you can always report these other tests in a non-preregistered results section in the final mansucript). See the "calibration" section of this paper for how to determine an expected effect size for an awareness test, or else the reference I gave at the end of point 2). The reviewer also brings up what the proper chance level is of your measure. Chance performance would be above zero.
Both reviewers also make points to improve the clarity of your arguments.
I look forward to seeing your revision. Let me now if you have any questions.