
Shedding light on task influence in the SNARC effect

Shape of SNARC: How task-dependent are Spatial-Numerical Associations? A highly powered online experiment
Abstract
Recommendation: posted 26 February 2025, validated 03 March 2025
Dalmaso, M. (2025) Shedding light on task influence in the SNARC effect. Peer Community in Registered Reports, . https://rr.peercommunityin.org/PCIRegisteredReports/articles/rec?id=794
Recommendation
List of eligible PCI RR-friendly journals:
- Advances in Cognitive Psychology
- Collabra: Psychology
- Cortex
- Experimental Psychology *pending editorial consideration of disciplinary fit
- Journal of Cognition
- Peer Community Journal
- PeerJ
- Psychology of Consciousness: Theory, Research, and Practice *pending editorial consideration of disciplinary fit
- Royal Society Open Science
- Studia Psychologica
- Swiss Psychology Open
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Reviewed by Peter Wühr
, 24 Feb 2025
Evaluation round #2
DOI or URL of the report: https://osf.io/4wpv6/files/osfstorage
Version of the report: V2
Author's Reply, 20 Feb 2025
Decision by Mario Dalmaso
, posted 08 Jan 2025, validated 09 Jan 2025
I have received the comments from the three reviewers and would like to thank them for their time and effort.
Two reviewers are pleased with the revisions you made. One noted a minor issue with grammar and bracket positions, which you may want to address. The other had no further comments and recommended acceptance.
The remaining reviewer suggested considering a potential issue with variability in category boundaries in the parity-judgment task. He proposed two approaches to explore this but left the decision to you. As the editor, I also leave it to you to address this point as you see fit. For the next revision, I will seek further input from this reviewer to assist in reaching a final decision regarding this first stage.
I look forward to your next version.
Reviewed by Michele Vicovaro
, 02 Jan 2025
I thank the authors for having carefully considered and successfully addressed all of my previous comments. I just have a very minor point: there is a need to check the grammar and the bracket positions in the last sentence before the "SNARC and MARC compatibility" section on page 7 (in the manuscript with tracked changes).
Thank you for the opportunity to review this very interesting work. I look forward to seeing the full results!
Reviewed by Christian Seegelke, 18 Dec 2024
The authors have sufficiently addressed all my comments and I happily recommend acceptance of this proposal.
Reviewed by Peter Wühr
, 15 Dec 2024
Evaluation round #1
DOI or URL of the report: https://osf.io/zm9dk
Version of the report: V1
Author's Reply, 29 Nov 2024
Decision by Mario Dalmaso
, posted 12 Sep 2024, validated 12 Sep 2024
Firstly, I apologise for the extended delay due to the challenge of finding three qualified and expert reviewers during this period. I want to express my sincere gratitude to the three reviewers (reviewer 3 is Peter Wühr) for their valuable comments.
All three reviewers liked the proposal and agreed that this study will provide important insights into the literature, and I agree with their assessment.
As you will see, the reviewers provided several practical suggestions. In particular, the first reviewer requested clearer hypotheses and more discussion on the MC-SNARC and PJ-SNARC correlation, raising concerns about the reliability due to the low consistency of SNARC effects. The second reviewer suggested focusing the introduction more on the study’s main goal and also commented on the proposed Bayesian analyses. The third reviewer asked for a clearer explanation of the dual-route model and proposed additional, exploratory analyses to explore whether SNARC effects change over time, particularly between tasks.
I would encourage you to resubmit the proposal by taking into account the suggestions made by the three reviewers.
Reviewed by anonymous reviewer 1, 17 Jul 2024
This pre-registered study aims to explore significant research questions with appropriate and fully reproducible methodologies. While the SNARC effect is a well-known phenomenon studied using various approaches and perspectives, some critical aspects remain underexplored. These include the exact ‘shape’ of the effect, specifically which mathematical model best captures the relationship between numerical magnitude and dRTs, and the relationship between the SNARC effect obtained through parity judgment and magnitude comparison tasks. This study aims to address these two important issues, potentially providing a valuable contribution to the literature.
The procedure for sample size determination and the analysis plan are robust, requiring no further suggestions for improvement. I would like to commend the authors for the robust and detailed description of the criteria for sample size determination and the analysis plan. This thoroughness sets a valuable example for enhancing the methodological rigor of studies in this research field. However, there are some suggestions to improve the clarity of the hypotheses definition.
1. On page 5, a reference to the concept of ‘primitive’ is made without a specific definition. In this context, ‘primitive’ seems to imply an ‘automatically activated process.’ However, a precise definition of this concept is needed to enhance clarity.
2. On page 6, at the end of the section entitled ‘Relevance of number magnitude and parity,’ a discussion of the distance effect is notably absent.
3. I have several concerns regarding the content of the ‘Correlation between MC- and PJ-SNARC’ section:
- Before discussing any predictions about the possible correlation between the MC- and PJ-SNARC, it should be explicitly acknowledged that previous studies have shown that the test-retest reliability of the SNARC effect is quite poor for both the MC task (Hedge et al., 2018) and the PJ task (Viarouge et al., 2014). As explained by Hedge et al. (2018), this low reliability likely has nothing to do with the stability or instability of the cognitive processes underlying the overt effect (i.e., the SNARC), but rather is related to the low inter-individual variability of the effect itself. In simple terms, low inter-individual variability imposes an upper limit on the magnitude of test-retest reliability.
- The correlation between PJ- and MC-SNARC can be expected to be, at most, as large as the test-retest reliability observed for a single task. However, it seems safer to predict that it might be even smaller due to superficial differences between the tasks. In light of this, it does not seem legitimate to test different theoretical predictions based on the observed correlation coefficient. For instance, a high correlation appears implausible regardless of the correctness of the MNL theory, and a low correlation is expected regardless of whether the dual-route or the WM accounts are correct or not.
- The authors briefly acknowledge this potential issue at the end of the section. This is confusing because if the authors are aware that the correlation between MC- and PJ-SNARC is likely to be small, then predicting that it might be large or moderate is illogical. In other words, comparing different theories based on the observed correlation coefficient seems impossible from any perspective.
- Apart from these crucial issues, I have concerns about the consistency of some theory-driven predictions regarding the correlation coefficient. For instance, the authors suggest that, according to the hypothesis that the MC- and PJ-SNARC are related to different WM systems (verbal or visuospatial), no correlation is expected. However, it seems reasonable to expect some correlation between verbal and visuospatial WM capacities. At the very least, some positive correlation is expected due to the superficial similarities between the tasks, as some general cognitive skills are likely involved in both tasks.
- In light of these concerns, I think that this section requires substantial revision. I suggest that the authors focus on the methodological issues related to interpreting the correlation between MC- and PJ-SNARC, refraining from making any theory-based predictions. Alternatively, if predictions about the correlation coefficient are discussed, it should be made clear that methodological issues render the comparison practically impossible.
- Additionally, I recommend the authors discuss an important difference between the present study and previous studies on the test-retest reliability of the SNARC effect. Unlike previous studies, this study does not include a ‘washing-out’ period between the two tasks. This procedural difference should be adequately considered when interpreting the results.
- Lastly, it appears that the discussion in this section does not translate into a specific analysis plan (see pages 19-20). Therefore, clarification is needed regarding the exact role of the correlation between PJ- and MC-SNARC within the broader analysis plan.
4. At the end of page 14, the authors suggest that using a categorical predictor for quantifying the MC-SNARC avoids systematic underestimation of the effect size and the correlation with other measures. However, as long as the quantification of the MC-SNARC relies on the unstandardized regression coefficient (b), as in most studies, switching from a continuous to a categorical predictor is unlikely to influence the effect size. The effect size is determined by the inter-individual variability of b, not by the dispersion of the data points around the model predictions. It should be clarified that the type of predictor may affect the effect size only if the MC-SNARC is quantified using the standardized regression coefficient (r).
- Apart from this, a critical point is missing: by increasing the fit of the model to the observed data, the use of a categorical predictor would likely increase the precision of the effect size estimate, which is highly desirable, regardless of whether it leads to an increase or a decrease in the effect size itself.
5. Page 19. The authors predict that the SNARC effect should be stronger in the second task due to the pre-activation of the spatial mapping of numbers. However, this prediction seems inconsistent with one of the most credited hypotheses regarding the ‘categorical’ shape of the MC-SNARC, which posits a direct relationship between RTs and SNARC magnitude. Since RTs are generally expected to be longer in the first task than in the second (due to a practice effect), it seems reasonable to predict that the SNARC effect should be stronger in the first task rather than the second. Both hypotheses appear logically sound; therefore, I recommend discussing and testing both (see also point 2 on page 20 and the Study Design Table). Testing both hypotheses should not change much in terms of sample size and analysis plan, since a two-sided t-test is already planned.
6. On Page 25, it is reported that participants will be asked about their level of mathematical skills. Can the author please motivate this choice, and how this information will be used in the context of the present study?
Reviewed by Christian Seegelke, 11 Sep 2024
Summary
The main aim of the proposed experiment is to examine differences and commonalties in the SNARC effect(s) using two commonly used tasks (i.e., magnitude classification (MC) and parity judgment (PJ)). Specifically, it aims to confirm whether the typically observed different shapes of the SNARC effect between the two tasks (i.e., continuous SNARC for PJ, categorical SNARC for MJ) hold in large sample size and to assess potential correlations between the two tasks. To this end, the authors propose to conduct a large online study, already proven to be suitable to assess the SNARC effect.
Evaluation
I appreciate the study aims in assessing SNARC effects in the two most common task in a large sample size. The research questions are valid, and the proposed hypotheses seem plausible. Further, outlined analyses plans are sound, power calculations are provided, and methods are clearly and in detail described such that it allows for replication. Data handling (i.e., cleaning, outlier removal etc.) is also described and critical manipulation checks are stated, and Bayesian statistics also allow quantification of evidence in favor of the null hypothesis.
That said, my only serious concern with the current proposal is that the framing of the introduction is not tailored to the research question(s). In my view, this study is for the most part a replication attempt of the SNARC effect in two well-established tasks in a large sample while also providing a more suitable assessment of the effects (with the modelling of magnitude as either continuous or categorical predictor). However, in the introduction the authors mention many different theories (e.g., MNL, dual-route models, WM-account, polarity-correspondence), but it is my impression that the data gained from this study cannot contribute much in terms of theoretical advancement. For example, doesn’t the fact that the MC and the PJ produce different SNARC shapes contradict the idea of a representation of a(continuous) mental number line? This is not to say that I see no merit in the study, I only think that the intro could be streamlined to better reflect the purpose of the study.
Finally, I have a methodological suggestion, the authors might consider. I very much appreciate the Bayesian approach to statistics, but I was wondering whether the (potential) different shapes of the SNARC effect could be evaluated more directly using a model comparison approach (as opposed to running t-tests on extracted R² values)? For example, Bayesian regression models could be specified using the brms package in R (Bürkner, 2017) and model comparison could be done using leave-one-out-cross validation with the loo package (Vethari et al. 2017). Alternatively, they could use BIC or AIC values as a model selection criterion. I just find these approaches more conventional and straight forward.
Reviewed by Peter Wühr
, 26 Aug 2024
Review of: "Shape of SNARC: How task-dependent are spatial-numerical associations? A highly powered online experiment."
Authors: Roth, Cipora, Overlander, Nuerk, Reips.
Submitted to: PCI RR
Summary: The authors submitted a proposal (stage 1 submission) for a highly powered online experiment addressing similarities and differences of spatial-numerical associations of response codes (SNARC) effects in two different tasks. The majority of studies has either used a magnitude classification task (MCT), in which participants classify number stimuli as smaller or larger than a reference (e.g., 5), or a parity-judgment task (PJT), in which participants classify number stimuli as odd or even, for investigating the SNARC effect. Although the SNARC effect is usually obtained in both types of tasks, they differ in processing requirements (e.g., the requirement to process number magnitude), and several differences in the observed SNARC effects have been reported. In this registered report, the authors propose a highly powered online experiment to investigate, and compare, the shape of SNARC effects in MCT and PJT, their potential correlation, and further effects of task features (e.g., task order, mapping order) on the size and shape of the SNARC effect. In the experiment, the authors intend to test a sample of 1,700 participants in standard versions of the MCT and the PJT. The experiment will have a 2 (task) x 2 (mapping/compatibility) within-subjects design. In addition, the orders of tasks and mappings will be independently varied between participants, and then used in some analyses (on task order and compatibility order effects). Several manipulation checks are planned, before the main analyses will be performed. Moreover, instead of relying on NHST, the authors will use Bayesian t tests to evaluate evidence for both the null and the alternative hypothesis.
Evaluation: The SNARC effect is among the most investigated phenomena in cognitive psychology, and has attracted researchers from many different disciplines. Nevertheless, there are still open issues concerning (a) differences in the requirements of experimental tasks that are most often used for investigating the SNARC effect, (b) the impact of basic design features (e.g., task order) on the SNARC effect, and (c) the robustness of differences in the characteristics of SNARC effects obtained with different tasks. The author‘s idea of investigating these issues in a highly powered online experiment makes perfect sense, and the results may clarify important methodological and theoretical issues. Hence, I have no doubts about the scientific validity of the research questions. Moreover, the to-be-tested hypotheses are plausible, and well justified on the basis of the literature. The authors conducted a careful and extensive power analysis for determining sample size, and they have expertly planned and described the methodologies for data collection, and data analysis. In fact, the methods are described in sufficient detail to allow for close replication of the proposed study procedures, and analysis, and to prevent undisclosed flexibility in the procedures and analysis. In summary, I cannot see a big issue that would prevent me from recommending acceptance of this proposal. Yet, I would like the authors to think about revising their description of dual-route models, and I would like to suggest additional (exploratory) analyses.
Specific comments:
(1) I did not fully understand the description of the dual-route model of the SNARC effect, and the implications for SNARC effects obtained in MCT versus PJT, as described on page 11. For example, I did not understand the statement that “number magnitude taking the fast unconditional route in PJ should not interfere much with number parity taking the slow conditional route.” In fact, number magnitude taking the fast unconditional route must interfere with the processing of parity in the conditional route (or with the outcome of this processing), because otherwise we would not observe any SNARC effects here.
In my recollection, the dual-route model of the SNARC proposed by Gevers et al. (2006) is a variant of the dual-route model proposed by Kornblum et al. (1990) for explaining spatial and other S-R compatibility effects. As correctly stated by the authors, dual-route models distinguish two parallel processing routes from stimuli to responses, a controlled (conditional) route, and an automatic (unconditional) route. Kornblum et al. applied this model to spatial compatibility effects and argued that both routes contribute to spatial S-R compatibility effects if stimulus location was relevant for the task, whereas only the automatic route contributes to spatial S-R compatibility effects (called ‘Simon’ effect) if stimulus location was irrelevant for the task. In particular, the spatial stimulus location will always automatically activate the spatially corresponding response, which facilitates performance when the corresponding response is the correct response, but impedes performance when the corresponding response is actually incorrect. When, however, stimulus location is relevant, a second influence on performance results from a variation of the S-R mapping between stimulus location and response location. Here, Kornblum et al. (1990) argued that processing of the compatible mapping by the controlled route is easier, or more efficient, than processing of the incompatible mapping, producing another source for the compatibility effect. Hence, when stimulus location is relevant, both the controlled route (more efficient processing of the compatible relative to the incompatible mapping) and the automatic route (automatic activation of spatially corresponding response) will both contribute to the spatial compatibility effect. This framework predicts, first, that spatial compatibility effects should be stronger when location is relevant than when location is irrelevant. Moreover, this framework also predicts (moderate?) correlations between compatibility effects in tasks with stimulus location being relevant and tasks with stimulus location being irrelevant since both tasks (or effects) have the automatic effect in common.
If we apply the dual-route logic to the SNARC effect, and to the design of the experiment proposed here, we would also assume two sources of the SNARC effect in the CMT, but only one source of the SNARC effect in the PJT. In particular, in both tasks, small numbers should automatically activate the left response, and large numbers should automatically activate the right response, producing an ‘automatic’ (Simon-like) SNARC effect in both tasks. In addition, one might argue that the controlled route contributes to the SNARC in the CMT, but not in the PJT. Therefore, one would have to assume that processing of the compatible (number-location) mapping is easier, or more efficient, than processing of the incompatible (number-location) mapping. Since the relevant S-R mapping and (irrelevant) S-R correspondence are perfectly correlated in the CMT, both mechanisms would contribute to SNARC effects in this task. In contrast, in the PJR, only the automatic effects of irrelevant number-location correspondence would drive the SNARC effect. There is also a variation of S-R mapping (between parity and response location) in the PJT of the present study, but this manipulation is orthogonal to the irrelevant number-location correspondence, and should therefore not (directly) affect the SNARC effect. Hence, in my view, a dual-route model would also predict (a) larger SNARC effects in CMT than in PJT, and (b) moderate correlations between SNARC effects in both tasks due to the common influence of the automatic route.
(2) I would like to suggest additional exploratory analyses addressing the issue of differences in the shape and size of SNARC effects between CMT and PJT. These additional analyses would compare the size and shape of SNARC effects in earlier and later parts of the experiment, and possibly inform about the time course of implicit magnitude classification processes in the PJT. In the following, I will sketch some arguments why such an analysis might be interesting.
I believe that the different shape of the effects mostly reflects different task requirements, but it is possible that the shape of the effects, particularly in PJT, changes during the course of the experiment. In particular, the CMT explicitly requires participants to classify the numbers in two categories, the “small” (or “smaller than five”) category and the “large” (or “larger than five”) category. Hence, it does not seem surprising that this (task-dependent) classification of stimulus numbers is stronger than the task-independent processing of numerical size, and is therefore also reflected in the shape of the resulting SNARC effect. In contrast, the PJT does not require any explicit classification of stimuli according to number magnitude. Therefore, task-independent processing of numerical size – although being irrelevant for the task at hand – may occur and produce a (smaller) SNARC effect with a more linear shape than observed in the CMT. Yet, it might be possible that participants (also) begin to classify the stimuli in the stimulus set as “small” (or “smaller than five”) versus “large” (or “larger than five”) later in the experiment, when they have become familiar with the stimulus set. In other words, during the PJT participants might either discover that the set actually consists of two groups separated by the missing number “5”, or have some natural tendency to classify stimulus sets with regard to some salient referent (e.g., the median or the modal value). Hence, it might be possible that the shape of the SNARC effect is changing from more “linear” at the beginning of the experiment, to more “categorical” at the end of the experiment. Therefore, it might be interesting to compare the shape of the SNARC effects (particularly in the PJT) between the first and second half of the experiment.
An alternative hypothesis might be that participants quite early start to inadvertently classify stimuli according to their magnitude in the PJT as well. This assumption seems plausible given the fact that SNARC effects in the PJT are mainly driven by relative numerical size (i.e., relative size in the stimulus set) rather than absolute size (e.g., Dehaene et al., 1993; Ben Nathan et al., 2009). Yet, the implicit magnitude classification in the PJT may affect the shape of the SNARC effect less strongly than the explicit magnitude classification in the CMT, leaving room for differences between numbers within the same category. If this hypothesis was correct, SNARC effects in the PJT should not occur from the very beginning of the experiment, but develop during the first blocks because participants need some time to become familiar with the stimulus set, which is the prerequisite for the implicit classification process. Hence, it might be interesting to analyze both the size and shape of the SNARC effect in the PJT as a function of experimental blocks. In contrast, one could assume that the size and shape of the SNARC in the CMT do not vary much across blocks because the explicit magnitude classification task quickly familiarizes participants with the stimulus set, and the resulting categories may dominate the shape of the SNARC from early trials on.
References
Ben Nathan, M., Shaki, S., Salti, M., & Algom, D. (2009). Numbers and space: associations and dissociations. Psychonomic bulletin & review, 16(3), 578–582.
Dehaene, S., Bossini, S., & Giraux, P. (1993). The mental representation of parity and number magnitude. Journal of Experimental Psychology: General, 122(3), 371–396.
Gevers, W., Verguts, T., Reynvoet, B., Caessens, B., & Fias, W. (2006). Numbers and space: A computational model of the SNARC effect. Journal of Experimental Psychology: Human Perception and Performance, 32(1), 32–44.
Kornblum, S., Hasbroucq, T., & Osman, A. (1990). Dimensional overlap: Cognitive basis for stimulus-response compatibility--A model and taxonomy. Psychological Review, 97(2), 253–270.
Signed Review (Peter Wühr)