Can the visual cortex maintain information in the short term?

based on reviews by Robert McIntosh, Evie Vergauwe and Vincent van de Ven
A recommendation of:

Causal evidence for the role of the sensory visual cortex in visual short-term memory maintenance

Submission: posted 10 October 2021
Recommendation: posted 05 June 2022, validated 06 June 2022
Cite this recommendation as:
Dienes, Z. (2022) Can the visual cortex maintain information in the short term?. Peer Community in Registered Reports, .


According to the sensory recruitment framework, the visual cortex is at least in part responsible for maintaining information about elementary visual features in visual short term memory. Could an early visual area, constantly taking in new information, really be responsible for holding information for up to a second? But conversely, could higher order regions, such as frontal regions, really hold subtle sensory distinctions? It must be done somewhere. Yet the existing evidence is conflicting. Phylactou et al. seek to address this question by applying transcranial magnetic stimulation (TMS) to disrupt early visual areas at intervals up to a second after stimulus presentation to determine the effect on visual short term memory performance. In this way they will causally influence the sensory cortex at relevant times while tightening up on possible confounds in earlier research.
The Stage 1 manuscript was evaluated over two rounds of in-depth review by three expert reviewers. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol:
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA. 
List of eligible PCI RR-friendly journals:
1. Phylactou, P., Shimi, A. & Konstantinou, N. (2022). Causal evidence for the role of the sensory visual cortex in visual short-term memory maintenance, in principle acceptance of Version 5 by Peer Community in Registered Reports.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Evaluation round #4

DOI or URL of the report:

Version of the report: 8

Author's Reply, 04 Jun 2022

Decision by , posted 03 Jun 2022

Dear Dr Phylactou

Thank you for your revision addressing the reviewers' points.  Just a couple of very minor things:

1) At the end of the introduction you say ", exploratory analyses will investigate any temporal differences between the proposed timing conditions."  Delete this clause,  because by declaring in advance what you will explore, it muddies the waters concerning the distinction between non-preregistered and preregistered analyses. That is, at Stage 1, one does not mention analyses that will be exploratory.

2) Thank you for your new simulations. Just be clearer about the results both in the text and the design table; namely, say when H1 is assumed (g = 0.58 etc), what proportion of B's are > 3 and also < 1/3; and when H0 is assume, what proportion of B's are > 3 and < 1/3.






Evaluation round #3

DOI or URL of the report:

Version of the report: 7

Author's Reply, 01 Jun 2022

Decision by , posted 12 May 2022

Dear Dr Phylactou,

Sorry for the delay in getting back; due to one of the original reveiwers being busy, I asked another, Rob Mcintosh, to judge how well you had addressed the original reviewers' points. Both the revewiers are very happy with how thoroughly you have revised the manuscript. Mcintosh has some very useful points of clarification that I ask you to address.

I have on further point of my own, concerning your scale factors. Note that when the meta-analysis you base you scale factors on reports an efect size of e.g. 0.8, that implies the true effect size may well be larger than 0,8. Let us say the 0.,8 was signficaint  just p < .05; then the meta-analytic 95% confidence interval would go to 1.60. Your logic of using small scale factors so that the plausible upper limit is about the mean meta-analytic effect size does not take this fact into account. One more thing: Can you simulate assuming either the population effect size is zero or that it is the meta-analytic mean, what proportion of times your BF would exceed 6 or 1/6, given your maximum N of 40? In other words, assuming the theory is false (effect = 0) what is the probability that you would find evidence against the theory (BF < 1/6)? Or given the theory is true, what is the probability you would find evidence for it?  This is a check that your maximum N is reasonable given your models of H0 and H1 and the requirement of test severity.




Reviewed by , 03 Mar 2022

The authors have done a great job in revising the manuscript. Regarding the concerns I had raised, they are all adequately and convincingly addressed either by revising specific sections in the mansucript, or by providing a reasonable justification as to why it was decided not to modify the manuscript. I have only one minor comment, and that is that some of the newly-added sentences are very long (and difficult), e.g. end of p.8 - begin of p.9, there is a 6-line sentence. Other than that, I have no further comments and look forward to the results and the publication!

Reviewed by , 11 May 2022

Due to the unavailability of the original Reviewer#2, I have been asked to assess whether this reviewer’s comments have been adequately addressed by revisions and accompanying responses. I have read the review history for the paper, as well as the latest tracked version of the paper, and the responses document, with a focus on the responses to Reviewer#2’s comments.

Overall, the review comments have been very thoroughly addressed. Points 1-3, and 6-9 all involve clarifications and/or methodological adjustments that the authors have responded to fully and clearly. Where the suggestions made to points 4-5 have not been followed, an adequate rationale has been given for not doing so, and a reasonable case made that it is not essential to the experiment. But I would emphasise strongly, with reference to (4), that the decision not to pre-register any tests of interactions by time-period makes it crucial that the Stage 2 report must drive major conclusions from the pre-registered tests, and not from exploratory follow-up analyses involving interactions by time.

I will add some minor comments of my own but these are by way of discussion only. At this stage of review, it would probably be unwelcome to bring up too many novel criticisms, and this is not the task that the editor has given me. But in case it is of any use...

As a general evaluation, I would say that this is a complex pair of experiments, and that the inter-relationship between them is not easy to follow from the text alone. The Design table is completely essential to make sense of the logical structure, but this table does not cover the full range of possible outcomes. For instance, it is not clear what should be concluded if the alternative hypothesis is supported for H2 and/or H4, but not for H5; or if the equivalent situation were to arise for H3/H6 and H7; or if the alternative hypothesis is supported for key hypotheses in either study but the outcome neutral H1 has not been supported. Similarly, the multiple testing of related hypotheses presents the possibility that some comparisons will find evidence for a conclusion whilst other tests of the same hypothesis will not, and it is not clear how conclusions will be drawn under these circumstances (the attitude to multiple comparisons for these Bayesian tests, in which inferential thresholds have been applied, has not been explicitly described). In short, the design of the experiment does not nail down all the interpretative degrees of freedom. Every effort should be made to do so if further modifications to the Stage 1 manuscript are required.

My strong impression iis that Experiment 2 seems to be the critical one, and that a simpler and stronger Stage 1 report would be possible if Experiment 1 had already been run as a first (non-preregistered) step, or even if an incremental preregistration approach were taken to the two experiments. However, this has not been done, and I do not suspect that the authors wish to change course at this point.

Other than these general points, I note only a couple of minor clarifications:

Abstract: “Behavioural effects in the ipsilateral occipital hemisphere to visual hemifield will indicate a causal involvement of the sensory visual cortex during a specific temporal point in VSTM.” >> Here, and elsewhere in the manuscript, care must be taken not to imply that the specificity of any effect to a given time point will be tested, because this is not part of the preregistered analysis plan. (The last two paragraphs of the Introduction run the same risk, and more careful language may be needed to make sure the reader knows clearly what the pre-registered components will test, and to keep this distinct from subsequent exploration.)

Introduction: “For example, Rademaker et al. (2017) interfered with sensory visual cortex TMS at 0 ms and 900 ms into the delay period, van Lamsweerde et al. (2017), at 0 ms, 100 ms, and 200 ms, and van de Ven et al. (2012) at 100 ms, 200 ms, and 400 ms.” >> This paragraph needs a more definition to make sure the reader knows what the zero point of the delay period means (is it the moment of stimulus offset?), and how long the typical memory delay is.

Introduction: “Based on a recent meta-analysis examining the effects of TMS on VSTM performance during the maintenance period, most studies differentiated between earlier (up to 200 ms into the maintenance period) and later (after 200 ms; usually at the middle of the maintenance period) stimulation (Phylactou et al., 2021).” >> It would be helpful to know what this meta-analysis found regarding the size of the effects of early and late TMS.

Overall, this is an interesting and generally well-motivated and well-specified experiment, addressing a topic of pretty wide interest to Cognitive Neuroscience. I wish the authors well in conducting the experiments.

Rob McIntosh, University of Edinburgh, UK.

Evaluation round #2

DOI or URL of the report:

Version of the report: 2.0

Author's Reply, 15 Feb 2022

Decision by , posted 13 Dec 2021

Dear Dr Phylactou


I now have detailed reviews back from two experts, both of which have made very useful points.  They raise issues of, amongst other things, consistency within the document regarding analyses and predictions; clarity of stopping rule; possible other outcome neutral tests and other checks (e.g. regarding phosphenes as a measure of drift); and clarity concerning the TMS procedure.

On the analysis, I am in favour of planned comparisons, as in a Registered Report one wants to remove not only all analytic flexibility but also inferential flexibility in interpretting the analyses, so that the use of auxiliary assumptions is not biased - and this is made much easier by planned comparisons. Nonetheless, both reviewers queried whether you wished to make no comment about the diference between timings. I am not saying you should test such diferences, and explroatory analyses can go in a separate exploratory section; but the question is what contrasts directly test substantial theory, as those contrasts will inform e.g. the main conclusions in the abstract (and the discussion). The same point about what can go in exploratory analyses and therfore not foregrounded in conclusions applies to e.g. analysis of RTs. Just bear in mind in responding to reviewers on this issue, in a  Registered Report, to keep things clean, one does not pre-register exploratory analyses.  

In terms of motivating your scale factors for the Bayes factors, I do not understand the main motivating sentence "The width parameter of each prior was calculated to correspond to the 90% probability of the effect size lying within the standardised differences (Hedge’s g) of accuracies and signal detection estimates between sensory visual cortex TMS and a control condition reported in a recent meta-analysis on the topic (Phylactou et al., 2021)." Do you mean you used a 90% CI on the diference scores? Or a 10th percentile of the diferences distribution? Bear in mind that a Bayes factor requires a roughly expected effect size for the scale factor (not e.g. a minimal meaningful effect); the meta-analytic effect sizes you quote are large compared to your scale factors - why not just use relevant meta-analytic effects?  Additionally, you might consider reporting a "Robustness Region" for each BF to show how robust it is to changes in scale factors (as defined here ); I leave this final point up to you.



Reviewed by , 16 Nov 2021

1A. The scientific validity of the research question(s)

The proposed study is concerned with the role of the sensory cortex is visual short-term memory. In particular, the study aims to test whether the sensory cortex plays a causal role in short-term maintenance of visual information, which is a highly relevant question that follows directly from the existing scientific literature and evidence base. The question is defined with sufficient precision to be answered through appropriate experiments. 

1B. The logic, rationale, and plausibility of the proposed hypotheses (where a submission proposes hypotheses)

The study aims at testing a clearly-defined hypothesis: that of whether the visual cortex plays a causal role in visual short-term maintenance. If that is the case, then interfering with the activity of the visual cortex during short-term maintenance should have an impact on visual short-term memory performance. The hypothesis follows directly from the research question and the currently-available theory in the field. The hypothesis is very popular in the field, and much recent research has aimed at testing it, either directly or indirectly. However, there seems to be some inconsistency between the text and Table 1 when it comes to the exact description of the hypotheses: 1) the text seems to argue that the authors aim at testing that there will be *no* difference between the ipsilateral and contralateral conditions”, see p. 8, line 7 from the bottom, even when TMS is applied during perceptual processing (i.e., the outcome neutral condition), whereas the table states “Given the established role of the sensory visual cortex during visual perception, we hypothesize that evidence *for* a difference between the ipsilateral and contralateral conditions will be present when sensory visual cortex TMS is induced at 0 ms”. 2) in the text, it is explained on p. 10, that, for the outcome-neutral condition “a significant drop in VSTM performance (decreased detection sensitivity) is expected in the ipsilateral compared to the contralateral condition).” This is a hypothesis with a clear direction: decreased detection sensitivity. However, in Table 1, the corresponding t-test appears to be two-sided, and it appears that “any” difference between ipsilateral and contralateral conditions would be considered as evidence for the hypothesis, with sample means used to indicate “whether the effects of TMS are inhibitory (sample mean < 0) or facilitatory (sample mean > 0).” Finally, one thing that was not considered is the idea that information in visual working memory can be represented in a continuous and a categorical way – it is not entirely clear whether, at the theoretical level, the authors aim specifically at the continuous memory representations or at both. 

1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable)

Overall, the methodology and proposed analysis pipeline appear as feasible and sound. I am not an expert in TMS so I cannot speak to the technical details of the proposed TMS procedure. 1) As a general comment on the proposed method, the authors argue, in the introduction, that the inconclusive findings in the literature are due to several important methodological issues with these studies and pinpoint two of them: binocular presentation (as opposed to monocular presentation) and complex stimuli (as opposed to simple stimuli). However, this point is made rather superficially; it would be more convincing to explain more in detail how these issues could account for the contradictory findings of study x vs. study y. This, in turn, would support the proposed method as the best way to address the issue, to obtain a conclusive answer to the question. 2) Bayesian sequential hypothesis testing is proposed, which is appropriate. However, the exact wording could be more precise as to avoid any ambiguity. For example, it is explained “Sample updating with a stopping rule set at BF10 > 6 or < 1/6 for all three paired t-tests.” – it may be useful to explain in detail that this means that, if one of the three t-tests does not reach the predefined criterion, testing will continue (if I understood correctly). Also, a minimum of 20 participants and a maximum of 40 participants is mentioned, but it is not entirely clear whether this implies that after a first batch of 20 participants, a second batch of 20 participants will be tested if needed (as opposed to, for example, adding participants in small batches of 5 after the initial batch of 20 participants). 3) As far as data exclusions are concerned, not much detail is provided. Only participants with less-than-optimal color vision are planned to be excluded. It is not clear what will happen in case of technical difficulties leading to loss of part of the data of a given participant. No performance-based exclusions are proposed. However, one could expect that only participants with a certain level of memory performance (better than guessing) would be included. For participants who are not really doing the task (i.e., are not trying to remember the orientation), it is difficult to expect that interfering with the visual cortex will impact their performance. 

1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses

The methodology and proposed analyses are described sufficiently in detail to replicate and to prevent undisclosed flexibility. One thing to note, though, is that the introduction mentions “We propose to test the hypothesis that, when visual stimuli are presented monocularly, participants’ average detection sensitivity (Stanislaw & Todorov, 1999), accuracy, and response time in a delayed change-detection VSTM task will not differ between the ipsilateral and contralateral conditions when TMS is applied”, whereas the proposed analysis plan appears to only use detection sensitivity as outcome variable. Will accuracy and reaction times also be analyzed? And if so, how will these findings be interpreted in the light of the findings on d’? Another point is that the authors are only proposing t-tests (e.g., one per timing condition) and no statistical test of any interaction. They will need to be very careful to interpret their results as to not make any claims that would have required testing the corresponding interaction (such as stating that the effect of TMS depended on when it was applied, which would require testing the interaction of Presentation side x Time of TMS). 

1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).

One outcome-neutral condition is proposed in Experiment 1: a difference in memory performance between ipsilateral and contralateral conditions when TMS is applied during encoding/perception (as opposed to TMS during retention). This is a strong positive control. As pointed out in Table 1, in case no difference between ipsilateral and contralateral conditions would be found in this control condition, it would indicate that TMS effects are undetectable between the ipsilateral and contralateral conditions. In that case, I think it would be useful to also have an outcome-neutral condition in Experiment 2, for the other proposed comparison, whereby one would test whether there is a difference in memory performance between real vs sham TMS when TMS is applied during encoding/perception (as opposed to TMS during retention). Also, there is currently no behavioral quality check; participants should reach a minimum level of memory performance to be included, in my opinion. 

Reviewed by , 08 Dec 2021

I reviewed the registered report by Phylactou et al., in which the authors propose a TMS study to investigate the contribution of lateralized (early) sensory cortex in the maintenance of visual information in short-term memory. The authors propose to use a novel combination of occipital TMS with dichoptic visual presentation that allows presentation of visual information to only one eye, such that it is processed in one hemisphere. The authors plan to conduct two experiments, in which sham TMS is to be used in experiment 2.

Overall, I think the motivation and argumentation for the study are strong and well developed. The  coverage of literature and consideration of current shortcomings in TMS studies of the role of visual cortex is thorough and elaborate. The authors make a clear case about why another TMS study of the role of sensory cortex in short-term memory maintenance is needed. However, I feel that the authors should acknowledge and discuss the current debate in the literature between Xu and others about whether current evidence indeed supports a memory-maintenance role of sensory cortex vs. a more perceptual/encoding/attentional role. To be sure, the authors cite the relevant works and published opinion pieces, but I feel the discussion is rather left untouched -- given the focus of the proposed experiments, this issue must be considered at one point (here or in manuscript of the results), to understand how evidence would eventually weigh into the debate.

A few more comments about the Introduction:

1) I find the paragraph (p7) about stimulus complexity in (not?) finding TMS effects rather unclear. If more complex stimuli tax memory more, it would not make it easier for TMS to impair processing? Or do the authors suggest that more complex stimuli are processed in higher levels of visual processing? What is meant with "elemental features" (complex stimuli also contain elemental features)? Please elaborate this notion or skip altogether -- I would think the motivation to use gratings / gabors in this study does not depend on previous choices for complex stimuli.

2) The stated hypotheses (p8) seems counterintuitive or otherwise unclear to me. What does "will not differ between ipsilateral and contralateral conditions" mean here? Surely, the authors do expect that TMS at particular timepoints will affect processing in ipsilateral but not contralateral conditions? Further, what are the hypotheses about the timepoints?

3) More generally, the role of the different timepoints in short-term consolidation, or the effect TMS could have on them is not discussed in the Intro at all. This must be spelled out as it is a key part of the TMS design in both experiments. What is the motivation to stimulate at 200ms after sample offset?

Concerning the methodology, the general approach appears sound and is in keeping with previous designs, which facilitates comparison between studies. However, I do have several reservations and questions about the proposed design, which I will list below.

1) What is the motivation to use double-pulse TMS at 10 Hz?

2) The choice of sham TMS in experiment 2 is a valuable part of the experiment. However, the current manuscript provides very little explanation about sham TMS. The authors state that they will use an identical coil, but this seems incomplete or incorrect. If it is identical to real TMS, then stimulation procedures must be different (e.g., holding coil at angle to the head, or orienting the sham coil 90 degrees away from real TMS orientation). If procedures are identical, then a different coil must be used (e.g., thicker shielding to elicit acoustic and tactic sensation without inducing magnetic fields in the tissue). In short, information about sham procedures and coil must be further specified. Further, according to the authors, how likely will it be that participants will notice the differences between real and sham stimulation, and, if so, whether this could affect their findings? The pros and cons of sham stimulation are well described in a series of publications by Duecker and Sack (e.g., PLOS ONE, 2013; Frontiers in Psychology, 2015).

3) Some details about trial responses are missing. Will there be a limited response window? Will participants receive feedback about their responses? In the analysis, do authors plan to use data trimming strategies (e.g., discarding overly fast or slow responses) -- if so, please provide details?

4) The analyses include a series of T-tests, which seems inefficient and "statistically costly" in terms of number of comparisons. Why are the authors not first resorting to repeated measures ANOVAs? The pattern of main and/or interaction effects would then be able to guide subsequent tests (with lower multiple comparison costs). The T-tests as currently presented are perhaps meant as planned comparisons --  if so, then the temporal relations between TMS timepoints are fully ignored (e.g., comparing early with late TMS timepoints). 

5) The authors specify that participants will undergo a practise session of 24 trials. Given their status as "practise trials", it seems these data will not be analysed. However, I think a TMS-free condition prior to the main experiment would be an excellent way to assess baseline task performance, especially for the non-stimulated hemisphere. Therefore, I would recommend that the authors include a (post-practise) baseline session free of TMS pulses (but perhaps after mounting the coil to the head).

6) The phosphene procedure could perhaps be made more instrumental to the design and results. Phosphenes are arguably most easily elicited in pheripheral vision, and its visual field location of induced phosphenes could provide information about the specificity of TMS manipulation. The authors present their stimuli centrally, for good reasons, but I wonder whether final TMS target location and phosphene visual field location could be informative in explaining (lack of) effects. That is, would more centrally induced phosphenes impair memory maintenance more than peripheral phosphenes? If so, this would provide strong support for a topographically organized neural locus of the TMS effect (if any).

7) Related to phosphenes: I would recommend that the authors check TMS and phosphene location between (one or more) trial blocks after experiment commencement in order to catch and correct drifts in location or (unexpected?) changes in visual cortical sensitivity to TMS.

8) The authors do not mention analysis of response times (see also previous comment about response times). If the authors will not analyse response times, please explain why. If they will analyse them, analysis procedurs must be explained (including any trimming or filtering).

9) Task stimuli: Some more details about the stimuli are required. What are the spatial frequency and contrast of the stimuli? Statement "(i.e. Gabor patch)" is not informative: Not all gratings are Gabor patches.

I hope that the authors can use my comments to further stregnthen their interesting manuscript and study design.

Evaluation round #1

DOI or URL of the report:

Author's Reply, 21 Oct 2021

Download tracked changes file

Thank you very much for the valuable feedback, comments, and guidance. 

After discussing with my co-authors we have resubmitted an updated version.

As you will see in the updated document, in the latest version, we will solely use Bayesian methods and focused our analyses on d', which is the main variable of interest.

For the analyses we suggest priors based on the effect sizes of a recent meta-analysis we have conducted.

Regarding our sample plan, given that our data will consist of repeated measures, we suggest sample updating for BF > 3 or < 1/3 (minimum 20 participants for counterbalancing, maximum 30 due to constraints in each experiment; total n = 40-60 participants).  

Looking forward to your positive response.

Decision by , posted 13 Oct 2021

Dear Dr Phylactou

Before I send your mansucript to review, could you address just one point, namely how the hypotheses are tested.  Take your first hypothesis as an example. It is about the difference between contralateral and ipselateral at 0ms; so the most powerful test is a planned t-test for precisely this contrast. If you are using power and frequentist analyses, you need to determine power for precisely this contrast. The guide for authors ( states "Since publication bias overinflates published estimates of effect size, power analysis should be based on the lowest available or meaningful estimate of the effect size." That is, you try to produce a minimally interesting effect size for this DV for this contrast. (It is unlikely that each DV is just as sensitive tote effect as any other.) You say you will also perform Bayes factors. Specifying both frequentist and Bayesian approaches gives you analytic flexibility; so for the RR choose one. For the BF, if you go this route, specify what scale factor you will use, and why it is relevant for that specific contrast. See here for guidance on specifying effects for both power and Bayes factors:  Repeat the above for each test of each hypothesis; i.e. you need power determined for each contrast; or else the scale factor for a BF justified for each contrast.  As it stands you do not say about non-signfiicant results: Might the population effect be non-existent, and if so how will you get evidence for this and what sort of such evidence would count against your hypotheses?  Let me know if you have any questions.



User comments

No user comments yet