Learning cross-modally to suppress distractors

Zoltan Dienes

Learning cross-modally to suppress distractors

Zoltan Dienes based on reviews by Miguel Vadillo and 1 anonymous reviewer

A recommendation of:

STAGE 1

Do task-irrelevant cross-modal statistical regularities induce distractor suppression in visual search?

Kishore Kumar Jagini and Meera Mary Sunny https://osf.io/2v3nb/?view_only=c1bf36677deb46cba762f37d7735c09c version v4

Read report on server

Abstract

ZH-CN

Submission: posted 21 December 2021
Recommendation: posted 26 April 2022, validated 26 April 2022

Cite this recommendation as:
Dienes, Z. (2022) Learning cross-modally to suppress distractors. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=155

Related stage 2 preprints:

No reliable effect of task-irrelevant cross-modal statistical regularities on distractor suppression
Kishore Kumar Jagini, Meera Mary Sunny
https://doi.org/10.31234/osf.io/d8wes

Recommendation

There are two fundamental processes that the brain engages in: statistical learning and selection. Indeed, past work has shown these processes often come together: People can use a task-irrelevant stimulus to predict a target stimulus even in different modalities (crossmodal statistical learning), thereby enhancing the processing of the target stimulus (selection). Further, people can learn where a distractor will be in order to efficiently suppress it (selecting out), using task irrelevant stimuli in the same modality (within-modality statistical learning).

In the current study, Jagini and Sunny will test whether people can learn where a distractor stimulus is, in order to suppress it (selecting out), using a task-irrelevant stimulus from a different modality (cross modal statistical learning). They will also test whether people can express awareness of the relation between the predictor task-irrelevant stimulus and the location of the distractor on a forced choice test. On some (but not other) theories of consciousness, such a test measures conscious knowledge of the association.

The Stage 1 manuscript was evaluated over two rounds of in-depth review. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).

URL to the preregistered Stage 1 protocol: https://osf.io/qjbmg

Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.

List of eligible PCI RR-friendly journals:

References

1. Jagini, K. K. & Sunny, M. M. (2022). Do task-irrelevant cross-modal statistical regularities induce distractor suppression in visual search? Stage 1 Registered Report, in principle acceptance of Version 4 by Peer Community in Registered Reports. https://osf.io/qjbmg

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Evaluation round #3

DOI or URL of the report: https://osf.io/th9bc/?view_only=c1bf36677deb46cba762f37d7735c09c

Version of the report: v3

Author's Reply, 26 Apr 2022

Download author's reply Download tracked changes file

Decision by Zoltan Dienes, posted 19 Apr 2022

Dear Kishore

A small revision still needed.

Let me just recap. For your previous power analysis you used d = 0.45 because it was somewhat smaller than the d = .60 of a previous study. In the last submission, you justify the minimal effect somewhat better by using the lower limit of a CI for a relevant effect from a previous study. For your first hypothesis you take the lower bound of a 60% CI of previous study to get d = 0.42. You explain you used a 60% CI because your practical maximum N is 85. For the second hypothesis you use a 95% CI of a previous study relevant to that effect which gives a d = 0.41, so consistent with your practical maximum N.

So what you have done is retrofitted the heuristic by choosing a % for the CI to fit what you could do practically. That is, the real heuristic that you used was to fit to your practical limit (which is scientifically irrelevant). What you need to do is work the other way round - start from the scientific context, and what you can practically do is either sufficient to address the scientific problem or not. If it is not, you would say up front that a non-significant result would not count against the hypothesis of a scientifically relevant effect. Now the % used for the CI is also arbitrary. But there is no scientific reason on the table for why the % should be different for the diferent problems. Also it is clear that a 60% CI rules out too little in terms of finding the smallest plausible value. I suggest you use a 80% CI for both problems; find the lower limit, and work out your power for both hypotheses with respect to that.

One further point that need not entail any revision to the current mansucript but should be brought up in your discussion if not. Your test of awareness is a forced choice test and does not separate out objective and subejctive thresholds. On two common theories of consciousness (higher order and global workspace) unconscious knowledge would allow above chance performance on your test. On another theory, (recurrent processing) your test does measure conscious processing. (See https://osf.io/mzx6t/ ) Thus, finding that the knowledge was above chance on your awareness test would only indicate conscious knowledge given some but not other theories of consciousness.

best

Zoltan

Evaluation round #2

DOI or URL of the report: https://osf.io/5qvtg/?view_only=c1bf36677deb46cba762f37d7735c09c

Version of the report: v2

Author's Reply, 14 Apr 2022

Download author's reply Download tracked changes file

Decision by Zoltan Dienes, posted 31 Mar 2022

Dear Kishore

The reviewers are largely happy with your changes. Vadillo raises a couple of points, one of which I want to highlight here - namely how the figure for an effect size of 0.45 in particular can be justified. I realize in almost all other papers which are not RRs no one really justifies their effect sizes used in power analyses. But we as a rule do for Registered Reports. Thus, while I realize you are already running more subjects than typical, there remains the point that a non-significant result only counts against there being any effect of interest for the H1 in question if the study was well powered for detecting such effects. Thus the power analysis is only as good as the reasons relating the minimally interesting effect size to the scientific problem in question. It is only by addressing this problem that you can justify rejecting your H1. One heursitic in the paper I previously refered you to is to use the lower limit of a confidence interval on the effect from relevant previous studies - if the lower limit is still interesting, then there is a case for that being the smallest effect of interest that is plausible (roughly treating the CI as a credibility interval). Or you may think about it some other way. (The Meyen method for equating direct and indirect task performance that Vadillo refers to assumes equal signal to noise ratio for each trial for the tasks, which is implausible - it makes the same assumption for trials that Vadillo points out shouldn't be made for tasks, so repeats the same issue at another level.)

best

Zoltan

Reviewed by Miguel Vadillo, 29 Mar 2022

The authors have done an excellent job at addressing my comments to the previous version. I appreciate in particular that they are now willing to test a substantially larger number of participants and that the ms now addresses the question of the low sensitivity in awareness tests. I only have relatively minor comments to the present version:

On page 7 the authors write “We hypothesise that if the participants are aware of the the relationship between auditory and visual distractor location regularities, we expect that the score received by each location linearly decreases from its distance from the actual HpValD location”. The sentence sounds a bit mysterious because the reader still has no clue as to how locations will be scored. This is not explained until page 15.

The new power calculations basically assume that the expected effect size is roughly the same in reaction times and in the awareness test (i.e., d = 0.45). But the former are measured over hundreds of trials and the latter are measured in just six questions. Implicitly, this means that the authors expect each question of the awareness test to be much more sensitive and informative than each trial of the visual search task, which is an arguable assumption, in my opinion. I am not asking the authors to make any change in the ms regarding this. I am just trying to highlight a recurrent problem in this area of research. There is a great paper about this problem by Sascha Meyen in JEP:General. https://www.tml.cs.uni-tuebingen.de/team/luxburg/publications/MeyenEtal2021.pdf

Reviewed by anonymous reviewer 1, 22 Mar 2022

The author has answered my questions adequately. I have no further comments.

Evaluation round #1

DOI or URL of the report: https://osf.io/9m35p/?view_only=3b7df2ce241d46118776f15b28c4feb0

Author's Reply, 17 Mar 2022

Download author's reply Download tracked changes file

Decision by Zoltan Dienes, posted 23 Feb 2022

Dear Kishore

I now have two reviews from experts about your submission. Both reviewers are overall positive, but they make a number of points that will need addressing in a revision. I want to draw your attention to three points in particular based on both my own reading and the reviewers' reactions, though all the reviewers' points need a response:

1) Align your statistical tests with the hypothoses tested. Vadillo asks about your ANOVAs. Note your Design Table does not refer to the ANOVAs, but to particular t-tests. Indeed, a valuable feature of the Registered Report format is you can plan precisely the contrast needed to test each hypothesis in advance. In order to limit inferential flexibility, other tests should typically not be specified. That is you do not need to specify omnibus ANOVA tests in order to justify the particular test of a hypothesis; one just specifies the exact contrast that tests each hypothesis. Further, in order to limit inferential flexibility, you should use just one system of inference: You could do frequentist t-tests or Bayesian ones; but pick one as the one you will do and from which inferences will follow.

2) Power/sensitivity should be specified for each test with justification of the effect size chosen. Thus, if you use frequentist tests, you need to justify a minimally interesting effect size that is scientifically relevant for each test, then determine power for that test, indicating the power for each test. Vadillo asks where d = 0.6 comes from. See here for how to approach the problem of specifying an effect size for power. On the other hand you might decide to use Bayesian t-tests. Then you should justify the rough size of effect expected for each test; see previous reference for this too. The use of default scale factors especially for tests with few trials, like your awareness test, can lead to spurious support for H0 (see here).

3) Vadillo also questions the sensitivity of your test of awareness. This point is related to the previous ones. You need an appropriate sensitivity analysis of every test you conduct - and you also need to list your awareness test in the design table (remove description in the text of tests that you don't list in the design table, in order to keep inferential flexibility under control; you can always report these other tests in a non-preregistered results section in the final mansucript). See the "calibration" section of this paper for how to determine an expected effect size for an awareness test, or else the reference I gave at the end of point 2). The reviewer also brings up what the proper chance level is of your measure. Chance performance would be above zero.

Both reviewers also make points to improve the clarity of your arguments.

I look forward to seeing your revision. Let me now if you have any questions.

best

Zoltan

Reviewed by anonymous reviewer 1, 26 Jan 2022

The registered report “Do task-irrelevant cross-modal statistical regularities induce distractor suppression in visual search? ” is well written and has a valid research question. The logic and rationale of the proposed hypotheses are clear. Also, the methodology is sound, clear and replicable. The authors have also considered additional outcome-neutral conditions for manipulation checks. Some concerns are listed below.

On p.5, at the end of the first paragraph, the authors claimed that there seems to be enough evidence to support that our brain learns and utilize statistical regularities of both task-relevant and task-irrelevant sensory stimuli for optimizing behaviour. Given the authors only mainly introduced the influence of statistical regularities of the salient distractors in the previous part, they should make it more clear the evidence about task-relevant and task-irrelevant stimuli, respectively.

The authors have included Chen et al.’s crossmodal contextual cueing studies which are quite relevant. However, I suggest the authors consult more crossmodal selective attention literature. For example, Spence C’s lab has done a lot of studies on this topic.

For the data analysis, the significance level alpha is set to 0.02. Why not use an alpha level of .05 that is commonly used to better balance the issues of Type I and Type II errors?

Overall, this is an interesting research proposal and I would like to see the outcome in the near future.

Reviewed by Miguel Vadillo, 01 Feb 2022

The main goal of the two experiments proposed in this RR is to explore whether distractor inhibition in the additional singleton task can be modulated by contextual auditory information. Specifically, the singleton distractor will be present more frequently in two particular locations, each of then cued by a distinctive sound. The question is whether participants will learn to use this sound to suppress attention to the location whether the singleton distractor is most likely to appear in that trial. Although this type of contextual modulations have been explored in other visual statistical learning paradigms, this is the first time that such an effect is tested in the additional singleton task. I think it can be quite interesting for researchers working in this area. I have relatively minor comments about the general procedure, design and context that I think should be easy to tackle in a revised version.

The authors (and editor) won’t be surprised to find out that I am a bit concerned about the awareness test included at the end of the experiment and the type of conclusions that can be drawn from them. The awareness tests included in this paradigm are almost always doomed to suggest that learning was unconscious. Participants’ learning is assessed through hundreds of visual search trials using a continuous measure (reaction times), but their awareness is assessed briefly in 2-4 yes/no questions. Logically this procedure is much more sensitive to detect a significant effect in reaction times than to detect an equivalent effect in awareness, introducing a strong bias to conclude that learning was unconscious. To be completely honest, this is not the authors’ fault: this kind of awareness tests is common in the area, but it happens to be highly misleading, and a well powered and carefully designed RR should avoid these shortcomings by all means. I am definitely not asking the authors to cite these papers, but to better appreciate these problems they might find useful to read our papers addressing this question in contextual cueing (Vadillo et al., 2016, PBR), location probability learning (Vadillo et al., 2020, JEP:Gen) and the additional singleton task (Vicente-Conesa et al., 2021, https://psyarxiv.com/yekvu/). In the particular case of location probability learning and the additional singleton task it is quite difficult to improve the sensitivity of the awareness test, because they are essentially a one-shot test, i.e., one cannot include more and more testing trials to improve the sensitivity of the awareness test, as can be done for instance in the contextual cueing task. But at least, one can try to complement the traditional yes/no dichotomous responses by continuous and potentially more sensitive measures. For instance, in the location probability learning task we have found that asking participants to rate the percentage of times the target has appeared in each quadrant is a more sensitive test than simply asking them to select a quadrant (e.g., Giménez-Fernández et al., 2020, JEPHPP). It is still a quite suboptimal measure, but slightly more sensitive. The authors can also consider replacing their yes/no or discrete choice responses by confidence ratings or any other response that provides a more nuanced and graded measure of awareness.

In any case, even if the authors decide to stick to this procedure (which I strongly advise against) I would still ask the authors to describe in much more detail what analyses they are planning to run on their awareness data. For the first and third questions (which are not very informative; this particular type of subjective rating is known to conflate “unawareness” with a conservative bias; see Flemming and Lau, 2014, https://www.frontiersin.org/articles/10.3389/fnhum.2014.00443/full), the authors plan to estimate the proportion of “yes” responses. But for the second and fourth question I don’t think they provide sufficient information to understand how they are planning to process and analyze these responses. They say that they will calculate the “distance between the locations indicated by the participant and the actual locations”. But how are they planning to do this? Recall that there are actually two high-probability locations and participants are selecting two locations. Let’s imagine that the actual locations are, say 1 and 5, and the participant chooses 2 and 3, for instance. How is this reduced to a single distance score? In addition, the authors say that they will analyze these scores by comparing them against zero. But I can’t understand the logic of this analysis. Shouldn’t the authors compare the observed score against the score that would be ideally observed if responses were completely random? I think that the authors need to provide much more detail here.

Power. The sample size was calculated to provide reasonable power to detect a d = 0.6 effect. But why is this effect size a good reference? I am sure that distractor suppression in the additional singleton task is usually much larger than this, but do we have any evidence to expect that the contextual (auditory) modulation of the effect, if true, will be larger than this? In relation to my previous note, the study also plans to determine whether learning was unconscious. Knowing that responses to the awareness test are likely to be quite noisy (see my previous paragraphs) would N = 39 be enough to test this hypothesis with sufficient power? I honestly doubt it. Our meta-analysis of awareness in the contextual cueing task yielded an average effect of dz = .31, and there are good reasons to suspect that the typical awareness test in the contextual cuing task is much more sensitive than the traditional test in the additional singleton task (e.g., it usually includes around 24 trials instead of one-shot responses). For the probabilistic cuing task we found an average awareness effect of h = .35, which requires at least 64 participants to reach just 80% power.

Minor comments

“task-irrelevant” -> the auditory stimuli are characterized in the ms as “task-irrelevant”. But given that they actually convey useful information (i.e., where the distractor is going to appear) I wonder if the name is actually fair. Wouldn’t it be better to refer to these stimuli as “contextual” stimuli instead?

At several points, the ms gives the impression that the studies explore how participants “anticipate” the distractors, i.e., how they are “perceptually suppressed by pro-active modulations” (p. 6). If the goal is to study anticipatory behavior, shouldn’t the auditory signals be presented before the search display instead of simultaneously?

p. 7 “HpValD” and “HpInVald”. The meaning of these acronyms only becomes clear in the following pages. Wouldn’t it be easier to understand the Hypotheses section if the previous paragraphs introduced the designed briefly, including the condition names?

p. 7 I found it a bit weird that the authors present Hypothesis 1:2 as an additional hypothesis. It is simply the negation of Hypothesis 1:1, isn’t it? Same comment in the Study Design Table.

p. 7 “… the former condition associated with the search trials… should produce fater RTs” Faster compared to what? Same problem in the following sentence.

p. 13. Note that the singleton distractors appear much more frequently in two locations than in any of the other locations. This is unavoidable, of course, but it is important to remember that it renders some of the statistical comparisons meaningless. For instance, there is no reason to compare either HpValD or HpInvalD with the low probability distractor location. Any significant difference could be due to either to the fact that the sound is unpredictive in the latter condition or to the fact that the distractor has appeared in a location where it seldom appears. It will be quite difficult to interpret this result. So, I wonder if it makes sense to include all four conditions in a single ANOVA (p. 17)

p. 19. In the awareness test, will participants understand what “colored non-target locations” are?

Signed,

Miguel Vadillo