
The role of working memory in translating between different number processing systems

Can adults automatically process and translate between numerical representations?
Recommendation: posted 27 October 2022
Dienes, Z. (2022) The role of working memory in translating between different number processing systems. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=105
Related stage 2 preprints:
I. Xenidou-Dervou, C. Appleton, S. Rossi, N. Guy, C. Gilmore
https://osf.io/me6tn
Recommendation
The Stage 1 manuscript was evaluated over two rounds of in-depth review. Based on detailed responses to the reviewers’ comments and edits to the Stage 1 report, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol: https://osf.io/32qdw
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
- Advances in Cognitive Psychology
- Cortex
- Experimental Psychology
- F1000Research
- Journal of Cognition
- Peer Community Journal
- PeerJ
- Psychology of Consciousness: Theory, Research, and Practice
- Royal Society Open Science
- Swiss Psychology Open
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #3
DOI or URL of the report: https://osf.io/dhpcj?view_only=bf8c069e022540aa9272452804f27db2
Version of the report: v2
Author's Reply, 22 Oct 2022
Dear Prof Dienes,
Thank you for the excellent point raised. As suggested, besides editing the design table accordingly, we have now also added a sentence on p. 18 in the revised manuscript explicitly explaining this:
“Although calculations of our smallest effect sizes of interest were informed by our theoretical predictions and practical considerations, they can also inevitably be considered arbitrary since no study has previously examined these effects, therefore nonsignificant results will be treated tentatively.”
We hope this meets with your approval,
Best wishes,
Iro
Decision by Zoltan Dienes
, posted 08 Oct 2022, validated 27 Oct 2022
Dear Iro
As per our email exchange, I said:
"In terms of justifying a minimally interesting effect you say:
"Based on estimates of adult performance on standalone comparison tasks (e.g., dot comparison accuracy: 99.7% (SD = 0.3) (Lyons et al. 2012), we powered to detect a difference in 5 out of 160 trials on the primary task, which would reflect a 3% difference."
I don't understand how the 3% follow from the 99.7% effect of Lyons et al. I am not grasping why this is a "the smallest relevant decrease in performance" that follows from the average performance.
I know justifying a minimal interesting effect is quite difficult! But I didn't follow the logic here at all.
While I am here, a shortly preceding sentence says:
"For secondary task performance, Monaco et al. (2013) found that participants can recall sequences of up to approximately 6 items in a phonological sequence, and 5 items in a visuospatial sequence, and therefore we used these estimates as our means, and a standard deviation of 4 (the number of times each sequence length is presented)."
The phrase in brackets implies it is a reason for setting the SD to 4; this I didn;t understand either."
You revised the paper accordingly, though the rationale for the minimally interesting effects were somewhat arbitrary (understandably, minimally interesting effects are in general very hard to pin down) though intuitively reasonable. So I suggested in the design table, indicate that a non-signficant result does not decisively count against the hypotheses. The only extra thing I ask now is that you explicitly put a sentence to this effect after explaining how you obtained your minimally interesting effects, ie. that non-signicant results will be treated tentatively.
best
Zoltan
Evaluation round #2
DOI or URL of the report: https://osf.io/ktq8e/?view_only=bf8c069e022540aa9272452804f27db2
Version of the report: v2
Author's Reply, 30 Jul 2022
Decision by Zoltan Dienes
, posted 29 Jun 2022
Dear Dr Xenidou-Dervour
Both reviewers are largely happy with your revisions, but would like some minor revisions mainly to tie down remaining analytic flexiiblity, to do with the use of directional vs non-directional tests and also outlier exclusion.
I would also like some more scientific justification for the minimal effect sizes chosen. You refer to past papers for deriving them, but I wasn't sure how you infer a minimally interesting and plausible effect from those papers. For example, why 100 ms? Would an effect of 50 ms really not be interesting (or else not plausible)? Why would 5 trials be just interesting? I realize these are nice clean numbers, and hence simple, and that is something going for them; but I worry a non-significant result might not mean much if the true effect was e.g. 50 ms and this would have been very interesting scientifically if known.
best
Zoltan
Reviewed by Lincoln Colling, 29 Jun 2022
I'm mostly happy with the revision, however, I do have a few points to note.
In the table in Appendix C some of the hypotheses that have been stated are explicitly directional and some are not explicitly directional when they should be.
For example, RQ3b: "This means we will expect to see a decrease in performance in the non-symbolic
primary task in the VSSP dual-task condition in compared to the standalone condition for large quantities, but no decrease in performance for small quantities."
But then "If t-tests for both small and large quantities are significant (p < .05)" Surely this should be directional.
or in RQ1b there's a mix of "If t-test is significant at p < 0.05, and indicates that performance is lower" (which is directional) but then "but the t-test indicates that there is no significant difference between the VSSP dual-task" (which is non-directional). If the hypotheses are directional then they should be clearly stated as directional. Furthermore, since two-tailed tests are being used, it should be clarified whether "non-significant" means two-tailed non-significant or whether it just means a result NOT "significant in the specified direction" (that is, one-tailed non-significant: i.e., t(10) = 2 gives a two-tailed p of ~.07, but a one-tailed p of either ~0.036 or ~0.96 depending on the tail that one looks at). For example, what if there is a significant difference, but just not in the predicted direction. For this reason, I think it might be easier to use one-tailed tests (at an alpha of 0.025) rather than two-tailed tests (at an alpha of 0.05) when the direction actually matters and only use two tailed tests when the direction doesn't matter.
Reviewed by Hannah Dorothea Loenneker, 13 May 2022
Thanks a lot for implementing our comments so rigorously, especially specifying the analysis decision tree.
I have only one more comment on data pre-processing regarding the following sentence in the introduction of Appendix C:
“Outliers will be examined for each condition (all combinations of primary and secondary task) and extreme outliers (> 3.29 SD, Field, 2016) will be removed from the analysis for that condition.”
As it makes a difference whether you look at group means or individual means per condition, I would like you to clarify this here. Furthermore, I do see it critical to look at outliers per condition, as Andre (2021) outlined in a recent simulation study that data pre-processing needs to be blind to the research hypothesis and addressing outliers separately per condition can artificially induce effects.
André, Q. (2021). Outlier exclusion procedures must be blind to the researcher’s hypothesis. Journal of Experimental Psychology: General.
Evaluation round #1
DOI or URL of the report: https://osf.io/evb6y/?view_only=bf8c069e022540aa9272452804f27db2
Author's Reply, 04 May 2022
Decision by Zoltan Dienes
, posted 07 Mar 2022
Dear Dr Appleton
Sorry for the delay in getting back to you about your submisison, I sent out 33 review requests - and I have now obtained reviews from 2 experts. Both have very helpful comments to make about improving the manuscript. Loenneker requests some clarifications regarding the introduction and methods. Note there is no need to write an anticipated discussion at this point. Colling makes the point "a power analysis is only as valuable as the effect size estimates that go into it"; that is, for power, and equivalence tests, the point is to find the minimal effect of interest, as justified by your particular scientific context: See here for advice https://psyarxiv.com/yc7s5/. A minimal effect of interest is different from a roughly expected effect, but you use both terms; further, you do not argue why the effects you chose are minimally interesting ones. Colling suggests the use of Bayes factors; these do incorporate roughly expected effect sizes. (See previous reference.) One further comment both on Colling's advice and on your planned analyses: The aim of a Registered Report is to tie down analytic flexibility. Thus one should stick to one inferential approach that allows justification for asserting no effect or an effect. That could be power, equivalence testing or Bayes factors. Stick with one in the pre-registration. Note your procedure of performing a significance test against 0 and then following up with an equivalence test if the former is non-significant can lead to a inconsistency because different models are used for the two classes of test: A t-test against 0 can be significant and yet an equivalence test would have shown equivalence had it been done. So the way to use equivalence testing as one coherent procedure is to generalize it as "inference over intervals": Determine if the X% CI lies inside or outside the equivalence region (assert/reject equivalnce respectively) or straddles it (suspend judgment). Colling's point remains for this as well as power: The procedure only makes scientific sense if the equivalence region is scientifically justified as the region of no interest.
Having said all that, the reviewers agree the design is well thought out to tackle the issues you wish to address, so I very much look forward to seeing a tidied up manuscript.
best
Zoltan
Reviewed by Hannah Dorothea Loenneker, 06 Dec 2021
The authors conceptualized a thorough study, investigating whether adults can automatically process and translate between numerical representations. Their research can be an interesting foundation for further studies, as they’re aiming at deepening the understanding of the link between numerical entities coded in different modalities within human cognition. They use the experimental design of a dual-task set-up, with different secondary tasks to assess whether working memory is a necessary prerequisite for (non-)symbolic magnitude comparisons and cross-modal translations.
Overall remarks:
I would like to acknowledge the authors’ intention to make their raw data publicly available and would like to stress the importance of well-documented meta-data with accessible descriptions of the respective variables (following the FAIR criteria).
I feel like there are some small language errors in your manuscript – please revise.
Introduction
Page 5, last paragraph: I wouldn’t agree that the first numerical representation children acquire is the verbal one, as studies implementing the looking time paradigm already show an increase in precision of the ANS between the ages of 6 to 10 months (e.g., Brannon, Suanda, Libertus, 2007; Lipton, Spelke, 2003, Wood, Spelke, 2005; Xu, Spelke, Goddard, 2005).
Page 5: “The ANS is assumed to provide estimates of the numerosity of a given set. Repeated presentations of the same numerosity result in varying points of activation and consequently representations of quantity in the ANS are approximate.” -> it’s not quite clear to me what you’re trying to say here. Can you reformulate?
I would probably introduce the role of inhibition and visuo-spatial skills for performance in ANS tasks when describing the association of ANS and arithmetic on page 6. This already gives a hint for your later research question that domain-general cognitive abilities may play a role for the associations between the different representations.
Can you clarify why working memory is the only domain-general cognitive function you expect to modulate the relationship between the different numerical representations? Considering the literature in this field, I do see several other candidates such as e.g., inhibition, verbal skill, or visuo-spatial skill.
Methods
What is your rationale for such a broad age range? With working memory and executive functions decreasing with age, I would suspect that you might find differences between age groups. Will you correct for these (for example adding age as a covariate in case it is significantly correlated with your outcome measure)?
Thanks for sharing your sample size considerations so transparently in the Supplementary. I’m just not quite sure how you come to the conclusion that expected decrease in performance in dual-task condition (number of trials difference or increase in RT) is 5 trials or 2 sequences. Could you elaborate on that a bit more in the manuscript?
Will you apply some kind of manipulation check that participants actually tried to stick to the secondary task? One could imagine some participants to only focus on the primary task resulting in low performance in the secondary task but a low effect on working memory load as well. You state that you will record accuracy of recall for both secondary tasks, but are you planning to control for this by excluding participants not reaching a certain level of performance regarding the secondary task?
Do I understand your design correctly: one participant will have both secondary tasks alongside each primary task so that the effect of the respective secondary tasks can be compared within participants?
In my view, the section on planned analyses lacks specification: Could you please lay out a decision tree, starting with testing the assumptions of your planned statistical analysis and then elaborating which changes you will make if the assumptions are not met (e.g., data transformation, robust hypothesis tests, etc.). Generally, I miss a section on your planned pre-processing pipelines, as these can heavily influence your results and should therefore be pre-registered as well. Will you only consider reaction times or accuracies as well?
Please consider possible limitations like the fact that you’re only using single-digit material which doesn’t allow for generalizations to multi-digit material. What if your material is too easy, resulting in ceiling effects and low variance?
As I believe the PCI design table to be very helpful in thinking through possible outcomes, it would be great if you included these considerations not only in the Supplementary but in an anticipated discussion section as well. One point regarding the design template: I think it would be more straightforward if you enumerated the three RQ3 as RQ3a, RQ3b and RQ3c as I was a bit surprised by the repetition of RQ3 which is only explained by the differences in the hypothesis column.
I’m looking forward to seeing the results of your interesting study.