The authors propose an experiment to test the effect of prefrontal HD-tDCS combined with a dopaminergic pharmacological intervention (levodopa) on behavioural measures of mind wandering. Overall I found much to like in this proposal. The rationale for the study is persuasive and the authors have put a lot of careful thought into the study design and control conditions. I also found the inferential chain from hypotheses through to analysis and interpretation to be well constructed and broadly in line with RR requirements.
Most of my comments focus on points of presentational clarity and methodological detail. I hope the authors find them helpful.
1. In the study design table you note: “BF10 > 6 or BF01 > 6 for stopping rule is enough evidence to establish a meaningful result. For all tests BF10 > 3 or BF01 > 3 is supported by the literature as enough evidence to establish a meaningful result. In addition, if the credible intervals (CIs) do not cross 0 for the probit modelling, this will be interpreted as a meaningful effect.” This needs some clarification and possibly simplification. What criteria exactly will determine a definitive outcome? From the 1st sentence it appears to BF>6. But the 2nd sentence seems to downgrade that to BF>3, and the 3rd sentence adds a whole other decision criterion. Is there a difference between these for critical and non-critical hypotheses? This needs to be crystal clear.
2. You discuss “critical” hypotheses and in the design table mention “non-crucial” tests. Does “critical” mean hypotheses that form the basis of the stopping rule and “non-crucial” mean those that are not? This would benefit from some clarification. Perhaps use comparable terminology between them (e.g. critical and non-critical, rather than critical and non-crucial) and make clear for each hypothesis which type it is and what this means. In doing so, make explicit the precise conditions under which testing will stop.
3. In the design table you state “BF > 6 or BF > 6 is supported by the literature as enough evidence..” Is the double mention of “BF>6” a typo?
4. The hypotheses concerning tDCS amplitude are non-directional, which I assume is a deliberate decision due to the mixed results of previous studies. Given the (apparent?) lack of a clear rationale for why 1mA tDCS should produce a greater effect than 2mA tDCS, I found myself wondering whether the 1mA condition is necessary at all. If you removed the 1mA condition, you could increase the sample size for a comparison of 2mA vs sham and perform a more sensitive test of the interaction between tDCS x drug. Then, in the event of a positive result, a later study (in a whole new RR) could then hone in on the dosage necessary to cause that effect. Even though this is how I would run the study, I offer it only as a suggestion to consider rather than a strong recommendation. However, if you do keep the tDCS dosage manipulation I would suggest strengthening the rationale for it in the introduction. For me, it really only makes sense to include it if there is some reason for supposing that 1mA tDCS might be more effective than 2mA tDCS.
5. There are good reasons for adopting a between-subjects design rather than a within-subjects design (including the fact that it helps to better preserve blinding of tDCS intensity, active vs placebo, and participant demand characteristics) but given the relative high cost in statistical sensitivity associated with between-subjects designs (compared to within), I would recommend including a justification of this particular design choice in the method.
6. You set a maximum sample size of 40 per group across the 6 groups due to resource constraints. It would ideal to include some Bayes Factor Design Analyses (using the exact analysis methods for each hypothesis) to determine just how sensitive this sample size will be able to detect effects of various sizes given the chosen prior (see https://link.springer.com/article/10.3758/s13428-018-01189-8 and https://link.springer.com/article/10.3758/s13423-017-1230-y). I suspect that the design, as currently proposed, would be sufficient to detect only quite large effects, in which case this limitation should be noted. A BFDA would help make this clear.
7. On p13 you note: “Participants will also be excluded from the study and replaced during the testing phase if their responses to the end of session questionnaire suggest that the participant did not understand how to correctly generate random number sequences. An example which would suggest the task has not been completed correctly would be if the participant cites a specific pattern that they used to approach the task (e.g., they repetitively used z,z,z,m,m,m,z,z,z,m,m,m to generate the sequences).” This strikes me as a sensible general rule but for a Stage 1 RR needs to be defined comprehensively and precisely, making clear the exact parameters under which a participant's response will be judged to be sufficiently non-random to warrant exclusion. This must fully pre-specified and reproducible and eliminate all possible researcher degrees of freedom in both the definition and implementation.
8. p16: “At the end of the session, participants and the experimenter will be asked to select which dopamine drug manipulation group they were in (levodopa, placebo), to assess the efficacy of the drug manipulation blinding”. Please specify exactly how the outcome of this test will be taken into account in the analysis and interpretation.
9. p17: “The stimulation will be immediately terminated if participants report experiencing any discomfort, or if there are any technical difficulties, including if Nurostym device identifies that the electrode impedances are too high, and self-terminates the stimulation.” Presumably participants who are excluded due to discomfort will be replaced? This should be stated explicitly as an exclusion criterion.
10. p17: “Given the FT-RSGT requires participants to respond accurately in time to a metronome tone, it is important to account for any influence of video game or musical training on participant’s response variability”. How exactly will this be accounted for in the confirmatory analyses?
11. Would it make sense to survey participants for the level of tDCS discomfort at the end of the session to account for potential disruptive effects of higher intensity stimulation on task performance? This seems particularly salient given that the tDCS is administered during the FT-RSGT, so (hypothetically) if 2mA stimulation happened to be significantly more uncomfortable than 1mA stimulation (yet insufficiently painful to lead to exclusion), the additional distraction could potentially explain differences in performance between 1mA vs 2mA over and above any cortical effects. One reason I mention this is that we have previously found in our own tDCS experiments that prefrontal 2mA stimulation can be painful in some participants (e.g. see footnote 1 on pp12-13 here: https://ore.exeter.ac.uk/repository/bitstream/handle/10871/124886/Sedgmond_BehaviouralNeuroscience_2020.pdf; in that study we actually had to turn it down to 1.5mA after Stage 1 IPA).
12. pp19-20, section “Post-study data exclusion”: will participants who are excluded due to being outliers within their group be replaced? I’m assuming so, but please state this explicitly. In addition, re: “Finally, to ensure that extreme outliers during the task do not skew any time on task effects, individual trials which are greater than 3 standard deviations above or below the mean for each group’s approximate entropy and behavioural variability scores will also be removed from the analyses.” What % of individual trials would need to be removed before an entire participant would be excluded and replaced outright? Are there any other data-based exclusion criteria of any kind, either at the level of trials within participants or participants withn the sample? I recommend reviewing these very carefully as these criteria cannot (in general) be changed after IPA.
13. p20: concerning the probit modelling, the authors note that “there will be several predictor variables, alongside their interactions, however participants stimulation condition (2mA active, 1mA, or sham) or dopamine condition (levodopa vs. placebo) will be entered in as the key predictor for the respective analyses, which are explained in detail below.” Please specify the full range of predictor variables and parameters. To nail this down precisely, I recommend including an analysis script as part of the Stage 1 revision based on simulated data.
Minor
Typo: check spelling for Nerostym / Nurostym as both are used in different places
P19: alterative > alternative (multiple instances)