Recommendation

Does reading out loud influence semantic encoding?

ORCID_LOGO based on reviews by Miguel Vadillo and 2 anonymous reviewers
A recommendation of:

The role of semantic encoding in production-enhanced memory

Abstract

EN
AR
ES
FR
HI
JA
PT
RU
ZH-CN
Submission: posted 30 January 2023
Recommendation: posted 27 October 2024, validated 27 October 2024
Cite this recommendation as:
Chambers, C. (2024) Does reading out loud influence semantic encoding?. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=382

Recommendation

The production effect is an intriguing memory phenomenon in which recall and recognition are improved when people read and study words aloud rather than silently. Although robust to a range of contexts, materials and manipulations, the underlying mechanisms that cause the production effect remain to be fully understood, largely due to the wide range of processes that are engaged during speaking compared to silent reading.
 
In the current study, Roembke and Brown (2024) ask whether semantic encoding – the encoding of new information based on its meaningful characteristics rather than sensory/perceptual characteristics – is a driving factor in production-enhanced memory. Across two carefully-controlled experiments in bilingual participants, the authors will test the hypothesis that the production effect should persist when items are matched in semantic but not other features at learning and recognition stages. If semantic encoding at least partially underpins the production effect, then they expect to observe it both when recognition items are presented as pictures or translations (their semantic recognition condition), and when recognition items match those at learning (their veridical recognition condition in which the same written words are presented at learning and recognition). Assuming also that the production effect does not rely exclusively on semantic encoding, the authors expect the production effect to be reduced in the semantic recognition conditions relative to veridical conditions in which words are matched on multiple linguistic features. 
 
The results of these experiments hold important implications for theoretical models of production-enhanced memory. If the authors find that the production effect persists when studied words can be recognised on their semantic features then this would suggest that production influences semantic encoding, which would in turn support theoretical models proposing that speaking engages modality-independent associations with semantic features. On the other hand, if no production effect is observed when participants are asked to recognise pictures or translations, this would raise the possibility that production may have little or no influence on semantic encoding, which would instead support alternative theories suggesting that speaking adds only modality-dependent features to memory traces.
 
The Stage 1 manuscript was evaluated over two rounds of in-depth review. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
 
URL to the preregistered Stage 1 protocol: https://osf.io/qc6rz
 
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA. 
 
List of eligible PCI RR-friendly journals:
 

References
 
Roembke, T. C. & Brown, R. M. (2024). The role of semantic encoding in production-enhanced memory: A registered report. In principle acceptance of Version 3 by Peer Community in Registered Reports. https://osf.io/qc6rz
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Evaluation round #2

DOI or URL of the report: https://osf.io/x6q3p?view_only=e81e8f3256e84c8e9abb783b30879943

Version of the report: 2

Author's Reply, 27 Sep 2024

Decision by ORCID_LOGO, posted 10 Jul 2024, validated 10 Jul 2024

Two of the reviewers from the previous round were available to evaluate your revised manuscript, and the good news is that both are satisfied and recommend IPA. There is just one remaining issue to consider in Miguel Vadillo's review, concerning the contingency plan for the d-prime calculations in the event of indiividual hit rates or false alarm rates of zero. As you are probably aware, there are various possible ways you could address this, such as excluding and replacing such participants, applying an adjustment to the d-prime calculation (e.g. as proposed by Hautus, 1995) or using a non-parametric alternative (such as A-prime). I will leave you to consider these possibilities.
 
Due to the July-August shutdown, you won't be able to submit this revision the usual way. Instead, please email us (at contact@rr.peercommunityin.org) with:
 
  1. A response to the reviewer/recommender (attached to the email as a PDF)
  2. The URL to a completely clean version of the revised manuscript on the OSF
  3. The URL to a tracked changes version of the revised manuscript on the OSF
In the subject line of the email please state the submission number (#382) and title. We will then submit the revision on your behalf.
 
Upon receipt of your revised manuscript and response, I anticipate being able to award in-principle acceptance without further Stage 1 review.

Reviewed by anonymous reviewer 1, 08 Jul 2024

I would like to thank the reviewers for their careful consideration of my comments. After thoroughly reading the manuscript and the responses to my and the other reviewers' points, I am satisfied with the new version of the manuscript and look forward to seeing the experiment conducted. The authors might be interested in a recent mathematical account that was just published, which might help during the discussion of the manuscript once the data is collected. This is part of the special issue on the production effect in the journal Experimental Psychology (Caplan & Guitard, 2024; Experimental Psychology: A feature-space theory of the production effect in recognition). I would also like to highlight that authors might want to consider Bayesian analysis as exploratory if the results lean toward the null/inconclusive direction, not impacting their current choice but might be a strength for the next step. Best of success in the data collection and in the next steps of this project.

Reviewed by ORCID_LOGO, 08 Jul 2024

As I explained in my previous review, I am not an expert in linguistic processing, but the rationale and logic behind the experiment seem compelling, even to a relatively naïve reader like me. I appreciate the new additions to the introduction, which help put the study in context. In particular, I liked the new information about the way in which the authors plan to interpret different potential results, including floor and ceiling effects. The new protocol responds to all my previous comments and I am happy to recommend acceptance. Just a very minor suggestion, given the relatively large number of participants, it is quite likely that at least occasionally either the hit or the false alarm rates will approach 0 or 1. Some type of correction will be needed if that happens. Maybe this is something worth considering in the analysis plan.


Evaluation round #1

DOI or URL of the report: https://osf.io/es53y?view_only=e81e8f3256e84c8e9abb783b30879943

Version of the report: 1

Author's Reply, 30 Jun 2024

Download author's reply Download tracked changes file

Dr. Tanja Roembke

Chair of Cognitive and Experimental Psychology

Institute of Psychology

RWTH Aachen University

Tanja.Roembke@psych.rwth-aachen.de

 

June 30, 2024

 

Prof. Dr. Chris Chambers, Recommender

Peer Community in: Registered Reports

 

RE: Revision of a manuscript for consideration (manuscript #382)

 

Dear Prof. Dr. Chris Chambers, 

 

We thank you for the opportunity to revise our manuscript, “The role of semantic encoding in production-enhanced memory: A registered report” (#382), for consideration as a registered report in Peer Community in: Registered Reports. We are very sorry for the long delay in getting this revision back to you.

 

By addressing the very helpful reviewers’ concerns and suggestions we believe we have improved the manuscript significantly in terms of its conceptual scope, the sample size justifications and additional quality controls (e.g., how floor effects may be addressed), while also including some missing methodological details. 

 

More specifically, we expanded the introduction to include additional background information on the production effect and spreading activation. Importantly, we also added in the discussion of different computational models (as suggested by Reviewer 1) that differ in their assumptions about whether speaking adds only sensorimotor features (i.e., modality-dependent features) or whether speaking can also add other features such as semantics (modality-independent features). We now explain how our study might inform this discussion. Furthermore, we expanded the section on quality controls, walking through different possible performance patterns in our data, and how we might address floor effects. We also completed additional power analyses taking into account Reviewer 2’s suggestion to consider the impact of lower correlation estimates. Finally, we included additional methodological details and completed stimulus selection. 

 

In the Response to Reviewers (separate uploaded file), we outline our changes to the manuscript (all highlighted in yellow) with page numbers and lines, in response to each of the comments (which are also included).

 

We confirm that this work is original and is not being considered for publication elsewhere. All authors agree to the contents of this manuscript.

 

Thank you for your consideration.

 

Kind regards,

 

Tanja Roembke, Ph.D., and

Rachel Brown, Ph.D.

Decision by ORCID_LOGO, posted 14 May 2023, validated 14 May 2023

I have now obtained three very helpful reviews of your Stage 1 submission. The reviewers agree that this is a rigorous and promising proposal, while also offering a range of constructive suggestions to consider. Headline issues to address include the inclusion of literature to provide additional context and strengthen the study rationale, greater prospective interpretation of null results (which is addressed in your design template but could perhaps be strengthened in the main text), inclusion of additional methodological details (including graphic presentations), and consideration of additional or alternative analyses.

One of the reviewers suggests the addition of Bayesian analyses, which can be useful for providing positive evidence of no effect. If including these, please be sure to define all priors and other parameters precisely, and if also including frequentist tests, be sure to explain which outcomes (Bayesian or frequentist) will determine whether a hypothesis is supported or not.

I look forward to receiving your revised manuscript in due course.

Reviewed by anonymous reviewer 1, 12 May 2023

After carefully reviewing Stage 1, here are some information organized as a function of each key criterion. Overall, the stage 1 manuscript was original, and I look forward to seeing the next version. 

 

1A. The scientific validity of the research question(s).

The authors provide a clear theoretically motivated research question. The research question is embedded in rich literature on the production effect and spreading activation to better understand the underlying processes driving the production effect with an original investigation with bilingual (English/German) speakers. In addition, the writing is clear and easy to follow. 

P1 (reference): This is a minor point, but the production effect was well established before the work of MacLeod et al. (2010). For instance, Murray (1965) in Nature published an article called “Vocalization-at-presentation, Auditory Presentation and Immediate Recall. 

P2 (alternative account): The RFM (revised Feature Model; Saint-Aubin et al., 2021 JML) which has been applied to account for the production effect in immediate recall, free recall, and reconstruction of order, might strengthen your argument here (or an alternative account). According to this computational account, production block rehearsal processed but add modality-dependent features (information related to the presentation of the information such as color, sound, pitch) and had no effect to my understanding on modality-independent features (information related to categorization and internal processes). It might help if the results are null as a likely alternative. 

 

1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable.

The logic, the rationale, and the plausibility of the proposed hypotheses are clear and well founded based on theory. Overall, the introduction was well written. 

P3 (null results or alternative results):  I would encourage the authors to provide further clarification of the implications of the null results. What are the theoretical implications of the results if nothing works as expected? Potentially important to consider the implication of transfer appropriate processing in the rationale. 

 

1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).

Overall, the methodology, the analysis pipeline, and the statistical power analysis were okay. I have some minor suggestions or clarification questions. 

P4 (recording): My understanding is participants will be tested remotely and they will have to record their “production”. The authors mentioned an inclusion criterion of 95% for “production” could the authors clarify how this will be assessed (apologies if I miss that information)?

P5 (design question): This is my misunderstanding. Will you present information at encoding in both languages or only one language? Are they any theoretical differences in having pure or mixed lists based on the spreading of activation?

P6 (stimuli): I believe the authors' description seems like they will be very careful in the stimuli selection, but I would appreciate having the stimuli lists as it is a major point and has been shown to drastically affect the results in the past. 

P7 (Bayesian analysis): I do not want to impose statistical preference, but could the authors add Bayesian analyses?

 

1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.

 

Overall, I believe the authors provide great clarity. The stimuli would help facilitate the clarity of the methodological details. 

P7 (method figure): Given the novelty of the procedure, it might be very helpful for the reader to have an illustration of the procedure for encoding and the test.  

 

1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).

Overall, I believe the authors have considered many key aspects of the quality of the results. Additional information about outcome-neutral conditions could be beneficial (e.g., what happens if performance is too difficult or too easy)

 

Reviewed by ORCID_LOGO, 10 May 2023

Before I start my review of the manuscript, I must disclose that I am not an expert in psycholinguistics or semantic processes and therefore I cannot judge the extent to which the current study is innovative or relevant for the area. On the other hand, I can confirm that the text is accessible for non-expert readers and, at least in my humble opinion, it looks like an interesting project, definitely worth pursuing. Logically most of my comments will be focused on methodological aspects, although to be fair, I only have minimal comments that the authors/editor need not agree with.

The authors have done an excellent job with their simulation-based power analysis, taking as reference different combinations of descriptive statistics from a related previous experiment. They have manipulated orthogonally means, SDs… but, incidentally, they have rather ignored what could be a relevant parameter of the simulation analysis. In particular, if I am not mistaken, they always assume a correlation of 0.5 between dependent measures. This is reasonable, but it is not impossible that the true correlation turns out to be substantially lower than that. Their dependent variable is a d’, which is computed as a difference score (essentially, hits minus false alarms). Difference scores are notoriously unreliable (see the famous Hedge et al. paper on BRM), which means that they do not correlate very well with anything. Perhaps it would make sense to run sensitivity analysis with lower correlation estimates and adjust sample size accordingly depending on the results.

Also related to power planning, is the number of trials per condition similar to Fawcett et al.? Otherwise, the SD estimate entered into the power simulation might not be adequate (more trials should result in less variability in effect estimates across participants).

On different places, the authors argue that if they do not find a significant production effect this could be due to the difficulty of the task (p. 11 and elsewhere). But I wonder if the opposite prediction could be made: Given that the production effect is defined as an increase in accuracy relative to baseline, would it be easier to detect the effect in conditions where baseline performance is relatively low? In other words, without additional information, I don’t think that the difficulty of the task on its own provides a sufficient explanation for any failure to observe the effect.

Perhaps this is standard practice in this area of research, but I found surprising that there are twice as many “old” items in the recognition test (20 previously in blue + 20 in white) as “lures” (just 20). Wouldn’t this bias participants towards responding “old”? I know that SDT disentangles sensitivity and criterion, but SDT comes with a number of assumptions that might not hold, in which case d’ estimates might be affected by response criterion. Wouldn't it be better to include 40 lures?

On page 22 I found it odd that the authors explain that they will test spoken vs. silent conditions as indexed by Hedges’ g. Whether or not they report effect sizes is independent from how they will test hypothesis. So, the sentence sounds a bit weird, because it seems to imply that the t-test will be run on Hedges’ g. On a different note, I am also not sure there are good reasons to report Hedges’ g instead of the more familiar Cohen’s d. Essentially, both effect sizes stimates measure the same thing, with the only exception that g corrects for a small bias in small samples. But with N = 75 d and g will probably agree to the second decimal and the equations for d are far more familiar for the average reader. I see no good reason for reporting g (although of course this is not incorrect or invalid).

Just a suggestion for the authors, if the production effect is at least partly based on semantic processes, they would expect the production effect to influence other semantically-driven effects, like false memories in the RDM paradigm. This could be an idea for future research.

 

Signed,

Miguel Vadillo

Reviewed by anonymous reviewer 2, 12 May 2023

This is a very interesting article, which will contribute not only to the current production effect literature, but also will be useful for overall linguistics theories. The hypothesis and predictions are well explained and appropriate for the study design. The power analysis is well justified and available on OSF. I recommend this manuscript to move to the next stage, with some minor revisions:

1. Additional information about spreading activation models and multi-step activation would be beneficial for better understanding of the hypothesis and predictions. 

2. Specially for production and spreading activation (page 5, line 10), more details/information is needed here to then link it to the production effect.

3. I find the wording in page 6, line 13 "that semantic encoding plays a role in the production effect" misleading, sine, as the authors mention, the production effect is found even in the absence of semantic representations. Maybe replacing that by the wording used further down "contributes to the production effect".

4. Page 7, line 7, has this been claimed before?

5. Page 7, line 16-17, do they do only the recognition task twice, or both learning and recognition twice?

6. Page 19, line 11: "classical" should be "classic"

7. For the data analysis section, I suggest also considering LMM instead of ANOVAs, to allow the authors to include item and participant effects and multiple comparisons.

User comments

No user comments yet