I have examined the authors' responses and the revised article - the authors have addressed my comments very adequately, providing better explanation of the study's motivation, and further detailing and improving the analysis pipeline. I thank the authors for addressing all reviewers' points and look forward to the findings.
DOI or URL of the report: https://osf.io/p6aum
Version of the report: 1
Thank you for the opportunity to contribute a stage 1 review of the manuscript ‘Is CPP an ERP marker of evidence accumulation in perceptual decision-making? A multiverse study’. To adhere to the advice on key issues to consider at stage 1 provided by the peer community site, I format my review into two sections: (1) according to each key issue, and (2) my additional comments on the planned multiverse analysis.
Key issues as recommended by the peer community site
1. ‘Does the research question make sense in light of the theory or applications? Is it clearly defined? Where the proposal includes hypotheses, are the hypotheses capable of answering the research question?’
a. The hypothesis noted in the final paragraph of the introduction is well defined and testable statement, that aligns with the theoretical framework. However, the summary in Table 1 loses this precision. I recommend that the authors amend the question and hypothesis within Table 1 to align with the precision provided within the introduction. For example, question - Is CPP a consistent ERP marker for evidence accumulation at the trial level across multiple perceptual decision-making tasks?; hypothesis - If CPP is a generalisable ERP marker of evidence accumulation, then CPP build-up rate will show a statistically significant positive correlation with the drift rate across multiple perceptual tasks.
2. ‘Is the protocol sufficiently detailed to enable replication by an expert in the field, and to close off sources of undisclosed procedural or analytic flexibility?’
a. It would be more transparent if the authors stated the decisions taken for the following decision points in the workflow: unaccepted task performance (if participants were not removed based on task performance, it would be clear to state this. If they were, please report the threshold used); whether variables were normalized and/or centered; were there adjustments for multiple testing; were bad channels in the EEG datasets identified and, if so, how were they handled; were bad data segments in the EEG datasets identified and, if so, how were they handled.
3. ‘Is there an exact mapping between the theory, hypotheses, sampling plan (e.g. power analysis, where applicable), preregistered statistical tests, and possible interpretations given different outcomes?’
a. The recommended amendment to the hypothesis at point 1 above would improve the direct mapping of the theoretical background to the hypothesis. It is noted that the authors use previously collected datasets, therefore an á prior power analysis is not applicable. However, the authors could report a sensitivity analysis to determine the smallest effect size that the existing sample sizes could reliably detect with a desired level of power (e.g., 80%), or commit to calculating the observed power based on the effect size obtained after conducting the analyses. The statistical tests are specified in advance and align with the hypothesis.
4. ‘For proposals that test hypotheses, have the authors explained precisely which outcomes will confirm or disconfirm their predictions?’
a. Yes.
5. ‘Is the sample size sufficient to provide informative results?’
a. As explained under point 3, this remains unclear until the authors either report a sensitivity analysis or commit to calculating the observed power.
6. ‘Where the proposal involves statistical hypothesis testing, does the sampling plan for each hypothesis propose a realistic and well justified estimate of the effect size?’
a. The authors analyse preexisting datasets. While they do not report the sampling approaches, they refer the readers to the original studies for further details.
7. ‘Have the authors avoided the common pitfall of relying on conventional null hypothesis significance testing to conclude evidence of absence from null results? Where the authors intend to interpret a negative result as evidence that an effect is absent, have authors proposed an inferential method that is capable of drawing such a conclusion, such as Bayesian hypothesis testing or frequentist equivalence testing?’
a. They interpret the 95% highest density interval of the posterior distribution for the effect of CPP build-up rate on drift rate, to allow probabilistic statements about parameter estimates rather than relying on p-values. The authors specify a criterion for concluding a positive effect: if the lower bound of the 95% HDI is above zero, they interpret this as evidence of a positive correlation between CPP and drift rate, implying that they would not necessarily conclude the absence of an effect but instead interpret this as insufficient evidence to support a positive correlation.
8. ‘Have the authors minimised all discussion of post hoc exploratory analyses, apart from those that must be explained to justify specific design features? Maintaining this clear distinction at Stage 1 can prevent exploratory analyses at Stage 2 being inadvertently presented as pre-planned.’
a. The authors have detailed a clear, pre-specified approach, with justification for the structured analysis plan. Authors report a predefined criterion for evaluating CPP build up effect on drift rate.
9. ‘Have the authors clearly distinguished work that has already been done (e.g. preliminary studies and data analyses) from work yet to be done?’
a. It is not immediately clear which analyses were completed by prior publications using the datasets. Related to this, a clear justification for the datasets selected from those available for the present study is required.
10. ‘Have the authors prespecified positive controls, manipulation checks or other data quality checks? If not, have they justified why such tests are either infeasible or unnecessary? Is the design sufficiently well controlled in all other respects?’
a. This is not reported in the present stage 1 manuscript.
11. ‘When proposing positive controls or other data quality checks that rely on inferential testing, have the authors included a statistical sampling plan that is sufficient in terms of statistical power or evidential strength?’
a. This is covered in my response to 3 and 10.
12. ‘Does the proposed research fall within established ethical norms for its field? Regardless of whether the study has received ethical approval, have the authors adequately considered any ethical risks of the research?’
a. Yes, the proposed research falls within established ethical norms for the field.
Multiverse analysis
It is encouraging to see that the authors wish to report uncertainty and assess the robustness of results to variations in data analysis decisions. Multiverse analyses should be systematic and decisions transparent. Therefore, the authors should (1) specify which element of the workflow is subjected to a multiverse analysis (i.e. two decision nodes in the analytical procedure are forked, whereas a multiverse analysis in general could refer to forking behavioural and EEG data preprocessing decisions also); (2) for the decision nodes that are forked, there should be transparency in the options that were considered at each decision node, including those that were not included, and the decision-making procedure to include those that are included. This will help readers to identify potential bias in the reported multiverse of results. (3) The authors should state whether the options included are equivalent (e.g. a principled multiverse, Del Guidice & Gangestad, 2021) and, if so, on which criteria are they deemed equivalent (e.g., comparable validity, examine the same effect, or estimate the effect with comparable precision).