DOI or URL of the report: https://osf.io/z8hb7
Version of the report: RRManuscript_Stage2_revision_Clean
We are almost there!
""We found that all three types of magnitude comparison necessitated WM resources, albeit in distinct ways”
The "in distinct ways" get the unjustified claim of a difference back in, so should be removed. The following sentences in the discussion clarify appropriately.
"“Also, we observed different interference effects in processing and translating numerical representations for smaller (1 - 4) and larger (5 – 9) numerosities, which were examined separately.”
I see how you have emphasized the statement is based on separate tests; and thus the "different interference effects" claim is best read as literally a different pattern of significance and non-significance, which is true. However, an intuiive reading still remains that it is a claim that the effects were different, and this may occur because so many researchers do draw that illegitmate conclusion in their own work (understandably - the rules of asserting conclusions in hypothesis testing are paradoxical). So you could add at the end "so differences in the effects cannot be claimed"(unless you tested the differences in interfernce effects and found them significant).
best
Zoltan
DOI or URL of the report: https://osf.io/x76tz?view_only=bf8c069e022540aa9272452804f27db2
Version of the report: 1
The two reviewers are positive about the Stage 2, but make some important points. Reviewer 2 asks for some more clarity about the central executive; note that you cannot add the suggested extra sentences to the introduction, but there might be some way of addresing the point in the discussion itself. Reviewer 1 points out that the pre-registered protocol was not followed exactly with respect to the conditions you would use non-parametric tests. It is important to stick to the letter of the pre-registration. I checked your reported analyses against the Design Table and found other discerepancies:
p 28
"Finally, comparing the two interference conditions (PL and VSSP), we found no difference either for accuracy, t(80) = 0.58, p = .57, d = 0.06, or median RT, t (78) = 0.09, p = .93, d = .01."
and also
". Finally, performance in the two non-symbolic interference conditions (PL and VSSP) significantly differed, t(79) = 2.44, p = .02, d = 0.27, with participants performing better under PL than VSSP interference. Median RTs did not differ between these two conditions, t(78) = 1.48, p = .14, d = 0.17."
These are not in the Design Table; place in a separate exploratory section.
Two things to bear in mind in drawing conclusions: Only conclude there was a difference between conditions if the difference was tested (and only draw such conclusions i nthe abstract or give them importance in the discussion if they were pre-registered); and also do not assert there was no effect because of a non-significant result unless you calculated power with a justified minimal effect for precisely that test.
p 34 "We also found that mechanisms for processing and translating numerical representations can differ for smaller (1 - 4) and larger (5 – 9) numerosities."
As the difference between numerosities was not tested this conclusion does not follow.
p 35
"However, these speed-accuracy correlation coefficients did not differ significantly, which suggests that there was no significant change in participants’ speed-accuracy trade-off across the conditions. Thus, a speed-accuracy trade-off on its own cannot explain the finding of improved accuracy in interference conditions."
A non-significant result without a properly justified power analysis for precisely that test does not justify asserting no effect.
abstract
"albeit involving different components of WM, to a different extent."
Testing differences between dot and digit tasks, or between their respective dual task conditions, were not pre-registered; so claims about such differences should not appear in the abstract.
One further point of clarity|:
"Surprisingly, in this task, accuracy improved"
I look forward to receiving your revision!
The authors submitted a thorough and transparent stage 2 registered report. As the stage 1 version was already based on an in-depth consideration of theoretical implications and the current empirical evidence, stage 2 follows up with a structured performance of the respective experiments and a comprehensible discussion in the light of relevant theories. The current study adds to available literature as it clarifies which component of the working memory is related to processing of and translating between different numerical modalities. This is obtained using a dual-task paradigm of (cross-modal) numerical comparison and working memory tasks. As the authors found working memory involvement to be crucial, they discussed that processing (non)symbolic magnitudes and translating between symbolic and non-symbolic numerical representations cannot be fully automatized.
I would like to stress that, on top of detailed reporting of results in the main text, the Supplementary Material provides interesting additional information. The authors also transparently reported deviations between stage 1 and stage 2.
Minor points:
The use of the past tense in the section “The present study” in the introduction is not entirely consistent so you might want to check whether you translated all present and future tense declinations from stage 1 to past tense for stage 2.
Formatting regarding paragraph indentation differs between sections – please unify
Page 18: “In the cross-modal comparison task, the side of presentation for the Arabic symbol was be counterbalanced.” – there is a linguistic error in transforming the old future tense into the past tense, needs to be “was counterbalanced”.
Also page 18: “(“z” if the left quantity is larger, “m” if the right quantity is larger)” – should be in the past tense “if the left quantity was larger”
In your planned analyses section, you stated that you would use Wilcoxon tests for variables showing a skew > 3. However, in the results section, you only report t statistics despite several variables exceeding this threshold in RQ1a and 1b. However, for RQ2 you do report non-parametric results. Therefore, it remains unclear to the reader when you used parametric or non-parametric tests.
Could you add descriptive statistics for the comparison small vs. large quantities as well in the main text?
The graphical summary of all results was very helpful. The figure would be easier to read if you added a note explaining the arrows a little bit more, i.e., stating that an arrow facing upwards indicates better accuracy/ slower reaction time for dual and an arrow facing downwards indicates worse accuracy/ faster reaction time for dual.
Page 32: “Non-symbolic comparison necessitated VSSP WM but also PL albeit to a lesser extent.” – on which result is this “to a lesser extent” based on? Did you compare effect sizes?
Page 34: “On the contrary, we observed widespread interference effects in this primary task (see Figure 5)” this sentence lacks a full-stop & “Thus, it appears that attentional mechanisms can mitigate WM interference under certain conditions.” – text does not need to be underlined