Evaluating adaptive and attentional accounts of sensorimotor effects in word recognition memory

ORCID_LOGO based on reviews by Gordon Feld and Adam Osth
A recommendation of:

Sensorimotor Effects in Surprise Word Memory – a Registered Report

Submission: posted 31 January 2023
Recommendation: posted 24 September 2023, validated 24 September 2023
Cite this recommendation as:
Sreekumar, V. (2023) Evaluating adaptive and attentional accounts of sensorimotor effects in word recognition memory . Peer Community in Registered Reports, .


Words have served as stimuli in memory experiments for over a century. What makes some words stand out in memory compared to others? One plausible answer is that semantically rich words are more distinctive and therefore exhibit a mirror effect in recognition memory experiments where they are likely to be correctly endorsed and also less likely to be confused with other words (Glanzer & Adams, 1985). Semantic richness can arise due to extensive prior experience with the word in multiple contexts but can also arise due to sensorimotor grounding, i.e., direct perceptual and action-based experience with the concepts represented by the words (e.g. pillow, cuddle). However, previous experiments have revealed inconsistent recognition memory performance patterns for words based on different types of sensorimotor grounding (Dymarska et al., 2023). Most surprisingly, body-related words such as cuddle and fitness exhibited greater false alarm rates. 

In the current study, Dymarska and Connell (2023) propose to test two competing theories that can explain the increased confusability of body-related words: 1) the adaptive account - contextual elaboration-based strategies activate other concepts related to body and survival, increasing confusability; and 2) the attentional account - somatic attentional mechanisms automatically induce similar tactile and interoceptive experiences upon seeing body-related words leading to less distinctive memory traces. The adaptive account leads to different predictions under intentional and incidental memory conditions. Specifically, contextual elaboration strategies are unlikely to be employed when participants do not expect a memory test and therefore in an incidental memory task, body-related words should not lead to inflated false alarm rates (see Hintzman (2011) for a discussion on incidental memory tasks and the importance of how material is processed during memory tasks). However, the attentional account is not dependent on the task instructions or the knowledge about an upcoming memory test. 

Here, Dymarska and Connell (2023) have designed an incidental recognition memory experiment with over 5000 words, disguised as a lexical decision task using carefully matched pseudowords during the encoding phase. The sample size will be determined by using a sequential hypothesis testing plan with Bayes Factors. To test the predictions of the adaptive and attentional accounts, the authors derive a set of lexical and sensorimotor variables (including a body-component) after dimensionality reduction of a comprehensive set of lexical and semantic word features. The analysis will involve running both Bayesian and frequentist hierarchical linear regression to explain four different measures of recognition memory performance based on the key sensorimotor variables and other baseline/confounding variables. While this analysis plan enables a comparison with the earlier results from an expected memory test (Dymarska et al., 2023), the current study is self-contained in that it is possible to distinguish the adaptive and attentional accounts based on the effect of body component scores on hit rate and false alarm rate.

The study plan was refined across two rounds of review, with input from two external reviewers after which the recommender judged that the study satisfied the Stage 1 criteria for in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol:
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
Dymarska, A. & Connell, L. (2023). Sensorimotor Effects in Surprise Word Memory – a Registered Report. In principle acceptance of Version 3 by Peer Community in Registered Reports.

Dymarska, A., Connell, L. & Banks, B. (2023). More is Not Necessarily Better: How Different Aspects of Sensorimotor Experience Affect Recognition Memory for Words. Journal of Experimental Psychology: Language, Memory, Cognition. Advance online publication. 

Glanzer, M., & Adams, J. K. (1985). The mirror effect in recognition memory. Memory & cognition, 13, 8-20.

Hintzman, D. L. (2011). Research strategy in the study of memory: Fads, fallacies, and the search for the “coordinates of truth”. Perspectives on Psychological Science, 6(3), 253-271.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Evaluation round #2

DOI or URL of the report:

Version of the report: 1

Author's Reply, 28 Jul 2023

Decision by ORCID_LOGO, posted 01 Jul 2023, validated 02 Jul 2023

Both reviewers have recommended some more minor revisions. The authors' revisions and/or response to these suggestions are required before a final decision can be made.  


Reviewed by , 13 Jun 2023

All in all, I have the impression that the authors put a lot of work into the revisions and that the Stage 1 manuscript has significantly improved. However, at times I was wondering, if they might make more changes to the manuscript rather than trying to explain things in the rebuttal that, I think, were already clear to me. Therefore I must ask for some additional revisions that otherwise may have been avoided.

Concerning my point 4: No changes were made to the manuscript. While it is interesting to read how the authors plan to perform sequential sampling in their rebuttal, they must make sure the manuscript is clear on this point, too.

Concerning my point 5: I do not entirely agree with the authors. While they are correct that their Bayesian approach does not require a frequentist power analysis, it does require justifications for choosing borders of the sampling plan. This was the reason for my question and the authors should include such justifications into the manuscript. (Why is 6360 the upper limit?)

Concerning my point 6: Please explicitly state in the manuscript that no outliers will be excluded and that no outlier correction will be performed (e.g., Windsorising) beyond the exclusion criteria.

Concerning my point 7: I do not agree with the authors. Since the results from the published study were not collected in an unbiased environment, effects such as regression to the mean makes it likely that the RR will differ from the original study. Any comparison the authors draw between the two datasets will be affected by this. Therefore, I would strongly urge to include the explicit condition. But I would leave it up to the editor to decide, whether my concern is crucial to Stage 1 acceptance.

Reviewed by , 31 May 2023

This is a good revision that addresses many of my concerns. My biggest point that I would like to make is that I maintain my position that mixed effects analyses would be useful here. I understand the authors' rationale for wanting to maintain some consistency with a previously published analysis. However, I would recommend doing mixed effects analyses *in addition* to the current analyses to see whether the results are comparable. I would also like to mention that Jeff Rouder and colleagues have developed techniques for performing hierarchical Bayesian mixed effects models that can calculate d' at an item level. The following papers may be useful:

Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application to the theory of signal detection. Psychonomic Bulletin & Review, 12, 573-604.

Pratte, M. S., Rouder, J. N., & Morey, R. D. (2010). Separating mnemonic process from participant and item effects in the assessment of ROC asymmetries. Journal of Experimental Psychology: Learning, Memory & Cognition, 36, 224-232.

The introduction gives a much clearer perspective on the theories of interest. However, I think it would also be useful to include an additional paragraph on some of the methodological issues, such as controls for word frequency and other things that their analyses consider. Otherwise the mentions about these word level controls come in out of nowhere in the Methods section.

Evaluation round #1

DOI or URL of the report:

Version of the report: 2

Author's Reply, 25 Apr 2023

Decision by ORCID_LOGO, posted 22 Mar 2023, validated 22 Mar 2023

Dear authors,

Thank you for submitting your manuscript for evaluation at PCI:RR. We have now received two helpful and detailed reviews and I believe all of their suggestions are reasonable. The reviewers have raised questions about the sampling plan and some analytical/methodological choices. They have both made suggestions to revise the Introduction and other parts of the manuscript for more clarity on the conceptual issues being tackled by the proposed work. I would request you go through these reviews and revise your manuscript to address the points raised by the reviewers and resubmit the manuscript for further consideration. 



Reviewed by , 20 Mar 2023

1A. The scientific validity of the research question(s)

a)The scientific question seems valid, but somewhat underwhelming. Maybe the authors could give some details about the size of this effect and how it interacts with other important cognitive processes. Currently, the question seems somewhat isolated and I am unsure what we really gain from answering it. I am sure there is an appealing reason to study this effect, but currently the authors do not include this rational into the stage 1 report. 

1B. The logic, rationale, and plausibility of the proposed hypotheses (where a submission proposes hypotheses)

a) For me it remains somewhat unclear, if Hypothesis 3 really takes care of all scenarios. A potential other explanation for increased false alarms is greater similarity of word meaning between lures and targets for the items with higher Body component scores. Can the researchers rule that out? I would also like them to summarize the other alternative explanations that may be construed and how their design takes care of them. 

b) The table at the end of the document was very hard for me to understand. I think it should include all hypotheses and their operationalisations as well the interpretations.


1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable)


a) The sequential sampling plan is unclear to me. Is it not possible that a BF that has crossed the treshold can return to the undecisive region again, if much evidence against or for an effect is collected? How do you deal with this, since there are five BF that need to be across the treshold at the same time. So as far as I see it, one could cross the treshold but be undecisive again by the time the others have also crossed the treshold. But maybe I misunderstand the procedure.

b) How were the borders of the sequential sampling plan determined? Was any formal power analysis performed? Why not?


1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses


b) This was all very clear to me and I have no comments.


1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).

a) This study relies on prolific, so some additional data quality checks may be good. For example, they could exclude participants with a d-prime below 0.1. In general, some info on how they will treat outliers could be helpful.

b) The authors should consider introducing an explicit memory condition into the study. Currently, they are relying on comparing the results structure from their study to existing data, but a direct comparison within one study would be preferable. This would strengthen the study a lot.

Reviewed by , 22 Mar 2023

My overall impression of this submission for a Registered Report is fairly positive. I think the authors are considering a high powered study with some relevant controls for their variables of interest. My recommendations for revision are fairly minor.
My largest criticism is whether the hierarchical regression analysis they have chosen is appropriate here. I think the standard for investigating lexical variables is instead to use mixed effects models that not only allow for variability across subjects but can additionally allow for variability across items and even variability in the effect size across items (e.g., varying slopes). It has been the standard to use ever since the influential Clark (1973) article and can be seen in the following investigations of item level effects in recognition memory:
Cox, G. E., Hemmer, P., Aue, W. R., & Criss, A. H. (2018). Information and processes underlying semantic and episodic memory across tasks, items, and individuals. Journal of Experimental Psychology: General, 147(4), 545–590.
Freeman, E., Heathcote, A., Chalmers, K., & Hockley, W. (2010). Item effects in recognition memory for words. Journal of Memory and Language, 62, 1-18.
Another important point concerns the sample size. 2,120 participants is really admirable for their standard. However, 20 participants per word means about 10 target and 10 lure trials per word, which is not extensive and can still lead to a lot of variability at the item level. The authors could consider using longer lists of words and/or more participants to up this.
Finally, it’s very clear that one of the big problems with this line of work is the correlations between the different lexical variables. While I admire the fact that the authors are including word frequency in their comparison, one of the current gold standards is actually contextual variability. Have the authors considered using this measure? One of the leaders in developing newer measures of context variability is Brendan Johns, and he had a recent paper demonstrating the advantages of these measures:
Johns, B. T. (2022). Accounting for item-level variance in recognition memory: Comparing word frequency and contextual diversity. Memory & Cognition, 50, 1013-1032.
One of the points he also emphasizes in this paper is that there is actually a quadratic relationship between frequency and/or context variability and performance, which the authors may want to consider.
Another point – it wasn’t clear whether any of the measures that the authors considered were correlated with word frequency or any other predictor they used. The Introduction should make this clear – it would be very easy to report and discuss a correlation matrix.
My other comments are extremely minor and easily addressable in a revision. They mainly comprise revisions of the Introduction and some clarity of the theoretical issues.
Minor comments:
I found the description of the various theoretical mechanisms somewhat puzzling. They seem to pop up at various points as explanations for relevant phenomena. I think it might make more sense to describe some of the underlying theory and/or competing theories initially and then describe the perplexing and contradictory effects reported in the literature.
“semantically-rich, distinctive words tend to facilitate recognition memory in the classic mirror pattern…” (p2). I’m not sure what the authors are referring to here, whether this is the advantage for low frequency words or for concrete words. Regardless, I don’t think it’s at all clear that the advantages reported were because the words are “semantically rich.” The causes of the word frequency effect are still debated in theoretical models today! For instance, Dennis and Humphreys (2001) argued that word frequency effects are just because of frequency – higher frequency words were experienced in more contexts and thus produce more interference. This says nothing about there being differences in the words’ semantic content.
“higher scores in this component made no difference to either hits or false alarms, which Dysmarska et al suggest may be due to lack of distinctiveness in communication-related words.”(p3)  How do we know there was a lack of distinctiveness? Even if the effect was found, how would we know that this was specifically due to distinctiveness? I don’t think that’s necessarily clear without an independent definition or theoretical conception of distinctiveness. I’m not saying there isn’t one – it’s possible to define distinctiveness as isolation in some type of representational space – but you cannot conclude that performance advantages are necessarily due to distinctiveness. It’s possible that other factors – such as just having more features or stronger encoding of said features – could be responsible.
“Instead of producing the semantic richness pattern of increased hits and fewer false alarms…” (p3) This is just the mirror effect, not a “semantic richness” effect.
…when participants are not aware they will be later tested on their memory for presented words… such elaboration is far less likely, and offers us an opportunity to adjudicate between theoretical accounts.” (p5-6) I am a fan of the approach that the authors are taking and I like surprise memory tests. However, I didn’t think this statement made a lot of sense. How do the different theoretical accounts require this manipulation? How does elaboration change their predictions? I think this statement should be made more clear.
It's also important to note that the usage of a lexical decision task can still produce strategies. In fact, the nature of the encoding task has a huge effect on performance and can even change the nature of the word frequency effect – see the following paper:
Criss, A. H., & Shiffrin, R. M. (2004). Interactions Between Study Task, Study Time, and the Low-Frequency Hit Rate Advantage in Recognition Memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(4), 778–786.
Adam Osth

User comments

No user comments yet