DOI or URL of the report: https://osf.io/x6q3p?view_only=e81e8f3256e84c8e9abb783b30879943
Version of the report: 2
I would like to thank the reviewers for their careful consideration of my comments. After thoroughly reading the manuscript and the responses to my and the other reviewers' points, I am satisfied with the new version of the manuscript and look forward to seeing the experiment conducted. The authors might be interested in a recent mathematical account that was just published, which might help during the discussion of the manuscript once the data is collected. This is part of the special issue on the production effect in the journal Experimental Psychology (Caplan & Guitard, 2024; Experimental Psychology: A feature-space theory of the production effect in recognition). I would also like to highlight that authors might want to consider Bayesian analysis as exploratory if the results lean toward the null/inconclusive direction, not impacting their current choice but might be a strength for the next step. Best of success in the data collection and in the next steps of this project.
As I explained in my previous review, I am not an expert in linguistic processing, but the rationale and logic behind the experiment seem compelling, even to a relatively naïve reader like me. I appreciate the new additions to the introduction, which help put the study in context. In particular, I liked the new information about the way in which the authors plan to interpret different potential results, including floor and ceiling effects. The new protocol responds to all my previous comments and I am happy to recommend acceptance. Just a very minor suggestion, given the relatively large number of participants, it is quite likely that at least occasionally either the hit or the false alarm rates will approach 0 or 1. Some type of correction will be needed if that happens. Maybe this is something worth considering in the analysis plan.
DOI or URL of the report: https://osf.io/es53y?view_only=e81e8f3256e84c8e9abb783b30879943
Version of the report: 1
Dr. Tanja Roembke
Chair of Cognitive and Experimental Psychology
Institute of Psychology
RWTH Aachen University
Tanja.Roembke@psych.rwth-aachen.de
June 30, 2024
Prof. Dr. Chris Chambers, Recommender
Peer Community in: Registered Reports
RE: Revision of a manuscript for consideration (manuscript #382)
Dear Prof. Dr. Chris Chambers,
We thank you for the opportunity to revise our manuscript, “The role of semantic encoding in production-enhanced memory: A registered report” (#382), for consideration as a registered report in Peer Community in: Registered Reports. We are very sorry for the long delay in getting this revision back to you.
By addressing the very helpful reviewers’ concerns and suggestions we believe we have improved the manuscript significantly in terms of its conceptual scope, the sample size justifications and additional quality controls (e.g., how floor effects may be addressed), while also including some missing methodological details.
More specifically, we expanded the introduction to include additional background information on the production effect and spreading activation. Importantly, we also added in the discussion of different computational models (as suggested by Reviewer 1) that differ in their assumptions about whether speaking adds only sensorimotor features (i.e., modality-dependent features) or whether speaking can also add other features such as semantics (modality-independent features). We now explain how our study might inform this discussion. Furthermore, we expanded the section on quality controls, walking through different possible performance patterns in our data, and how we might address floor effects. We also completed additional power analyses taking into account Reviewer 2’s suggestion to consider the impact of lower correlation estimates. Finally, we included additional methodological details and completed stimulus selection.
In the Response to Reviewers (separate uploaded file), we outline our changes to the manuscript (all highlighted in yellow) with page numbers and lines, in response to each of the comments (which are also included).
We confirm that this work is original and is not being considered for publication elsewhere. All authors agree to the contents of this manuscript.
Thank you for your consideration.
Kind regards,
Tanja Roembke, Ph.D., and
Rachel Brown, Ph.D.
I have now obtained three very helpful reviews of your Stage 1 submission. The reviewers agree that this is a rigorous and promising proposal, while also offering a range of constructive suggestions to consider. Headline issues to address include the inclusion of literature to provide additional context and strengthen the study rationale, greater prospective interpretation of null results (which is addressed in your design template but could perhaps be strengthened in the main text), inclusion of additional methodological details (including graphic presentations), and consideration of additional or alternative analyses.
One of the reviewers suggests the addition of Bayesian analyses, which can be useful for providing positive evidence of no effect. If including these, please be sure to define all priors and other parameters precisely, and if also including frequentist tests, be sure to explain which outcomes (Bayesian or frequentist) will determine whether a hypothesis is supported or not.
I look forward to receiving your revised manuscript in due course.
After carefully reviewing Stage 1, here are some information organized as a function of each key criterion. Overall, the stage 1 manuscript was original, and I look forward to seeing the next version.
1A. The scientific validity of the research question(s).
The authors provide a clear theoretically motivated research question. The research question is embedded in rich literature on the production effect and spreading activation to better understand the underlying processes driving the production effect with an original investigation with bilingual (English/German) speakers. In addition, the writing is clear and easy to follow.
P1 (reference): This is a minor point, but the production effect was well established before the work of MacLeod et al. (2010). For instance, Murray (1965) in Nature published an article called “Vocalization-at-presentation, Auditory Presentation and Immediate Recall.
P2 (alternative account): The RFM (revised Feature Model; Saint-Aubin et al., 2021 JML) which has been applied to account for the production effect in immediate recall, free recall, and reconstruction of order, might strengthen your argument here (or an alternative account). According to this computational account, production block rehearsal processed but add modality-dependent features (information related to the presentation of the information such as color, sound, pitch) and had no effect to my understanding on modality-independent features (information related to categorization and internal processes). It might help if the results are null as a likely alternative.
1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable.
The logic, the rationale, and the plausibility of the proposed hypotheses are clear and well founded based on theory. Overall, the introduction was well written.
P3 (null results or alternative results): I would encourage the authors to provide further clarification of the implications of the null results. What are the theoretical implications of the results if nothing works as expected? Potentially important to consider the implication of transfer appropriate processing in the rationale.
1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).
Overall, the methodology, the analysis pipeline, and the statistical power analysis were okay. I have some minor suggestions or clarification questions.
P4 (recording): My understanding is participants will be tested remotely and they will have to record their “production”. The authors mentioned an inclusion criterion of 95% for “production” could the authors clarify how this will be assessed (apologies if I miss that information)?
P5 (design question): This is my misunderstanding. Will you present information at encoding in both languages or only one language? Are they any theoretical differences in having pure or mixed lists based on the spreading of activation?
P6 (stimuli): I believe the authors' description seems like they will be very careful in the stimuli selection, but I would appreciate having the stimuli lists as it is a major point and has been shown to drastically affect the results in the past.
P7 (Bayesian analysis): I do not want to impose statistical preference, but could the authors add Bayesian analyses?
1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.
Overall, I believe the authors provide great clarity. The stimuli would help facilitate the clarity of the methodological details.
P7 (method figure): Given the novelty of the procedure, it might be very helpful for the reader to have an illustration of the procedure for encoding and the test.
1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).
Overall, I believe the authors have considered many key aspects of the quality of the results. Additional information about outcome-neutral conditions could be beneficial (e.g., what happens if performance is too difficult or too easy)
Before I start my review of the manuscript, I must disclose that I am not an expert in psycholinguistics or semantic processes and therefore I cannot judge the extent to which the current study is innovative or relevant for the area. On the other hand, I can confirm that the text is accessible for non-expert readers and, at least in my humble opinion, it looks like an interesting project, definitely worth pursuing. Logically most of my comments will be focused on methodological aspects, although to be fair, I only have minimal comments that the authors/editor need not agree with.
The authors have done an excellent job with their simulation-based power analysis, taking as reference different combinations of descriptive statistics from a related previous experiment. They have manipulated orthogonally means, SDs… but, incidentally, they have rather ignored what could be a relevant parameter of the simulation analysis. In particular, if I am not mistaken, they always assume a correlation of 0.5 between dependent measures. This is reasonable, but it is not impossible that the true correlation turns out to be substantially lower than that. Their dependent variable is a d’, which is computed as a difference score (essentially, hits minus false alarms). Difference scores are notoriously unreliable (see the famous Hedge et al. paper on BRM), which means that they do not correlate very well with anything. Perhaps it would make sense to run sensitivity analysis with lower correlation estimates and adjust sample size accordingly depending on the results.
Also related to power planning, is the number of trials per condition similar to Fawcett et al.? Otherwise, the SD estimate entered into the power simulation might not be adequate (more trials should result in less variability in effect estimates across participants).
On different places, the authors argue that if they do not find a significant production effect this could be due to the difficulty of the task (p. 11 and elsewhere). But I wonder if the opposite prediction could be made: Given that the production effect is defined as an increase in accuracy relative to baseline, would it be easier to detect the effect in conditions where baseline performance is relatively low? In other words, without additional information, I don’t think that the difficulty of the task on its own provides a sufficient explanation for any failure to observe the effect.
Perhaps this is standard practice in this area of research, but I found surprising that there are twice as many “old” items in the recognition test (20 previously in blue + 20 in white) as “lures” (just 20). Wouldn’t this bias participants towards responding “old”? I know that SDT disentangles sensitivity and criterion, but SDT comes with a number of assumptions that might not hold, in which case d’ estimates might be affected by response criterion. Wouldn't it be better to include 40 lures?
On page 22 I found it odd that the authors explain that they will test spoken vs. silent conditions as indexed by Hedges’ g. Whether or not they report effect sizes is independent from how they will test hypothesis. So, the sentence sounds a bit weird, because it seems to imply that the t-test will be run on Hedges’ g. On a different note, I am also not sure there are good reasons to report Hedges’ g instead of the more familiar Cohen’s d. Essentially, both effect sizes stimates measure the same thing, with the only exception that g corrects for a small bias in small samples. But with N = 75 d and g will probably agree to the second decimal and the equations for d are far more familiar for the average reader. I see no good reason for reporting g (although of course this is not incorrect or invalid).
Just a suggestion for the authors, if the production effect is at least partly based on semantic processes, they would expect the production effect to influence other semantically-driven effects, like false memories in the RDM paradigm. This could be an idea for future research.
Signed,
Miguel Vadillo
This is a very interesting article, which will contribute not only to the current production effect literature, but also will be useful for overall linguistics theories. The hypothesis and predictions are well explained and appropriate for the study design. The power analysis is well justified and available on OSF. I recommend this manuscript to move to the next stage, with some minor revisions:
1. Additional information about spreading activation models and multi-step activation would be beneficial for better understanding of the hypothesis and predictions.
2. Specially for production and spreading activation (page 5, line 10), more details/information is needed here to then link it to the production effect.
3. I find the wording in page 6, line 13 "that semantic encoding plays a role in the production effect" misleading, sine, as the authors mention, the production effect is found even in the absence of semantic representations. Maybe replacing that by the wording used further down "contributes to the production effect".
4. Page 7, line 7, has this been claimed before?
5. Page 7, line 16-17, do they do only the recognition task twice, or both learning and recognition twice?
6. Page 19, line 11: "classical" should be "classic"
7. For the data analysis section, I suggest also considering LMM instead of ANOVAs, to allow the authors to include item and participant effects and multiple comparisons.