The role of emotion and age on different facets of episodic memory (“what”, “when”, “in which context”) 

ORCID_LOGO based on reviews by Mara Mather and 1 anonymous reviewer
A recommendation of:

The role of positive and negative emotions on multiple components of episodic memory (“what”, “when”, “in which context”) in older compared to younger adults: a pre-registered study


Submission: posted 12 April 2023
Recommendation: posted 19 March 2024, validated 25 March 2024
Cite this recommendation as:
Wonnacott, E. (2024) The role of emotion and age on different facets of episodic memory (“what”, “when”, “in which context”) . Peer Community in Registered Reports, .


How does emotion influence item memory (what?) temporal memory (when?) and associative memory (in which context?), and does this differ for younger and older adults? Previous research has found inconsistent results, possibly due to small sample sizes. In this study, Laulan and Rimmele (2024) and will build on the paradigm in Palombo et al. (2021) in which participants see images embedded in videos and are asked to remember the images (what?), their temporal position within the videos (where?), and the association between the images and the videos (in which context?). Image valence (positive vs negative vs neutral) and participant age-group (18-30 vs 60-80 yr olds) are manipulated. Pre-registered analyses will first look at the two age groups separately to test for an effect of valence for each of the memory components, and second test for modulating effects of age-group. To be cost-effective, a sequential analysis approach with statistical analyses conducted at three time points and a maximum sample size of 150 younger and 150 older adults is planned.
The Stage 1 manuscript was evaluated over two rounds of in-depth review. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol:
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:

1. Laulan, P. & Rimmele, U. (2024). The role of positive and negative emotions on multiple components of episodic memory (“what”, “when”, “in which context”) in older compared to young adults: a pre-registered study. In principle acceptance of Version 3 by Peer Community in Registered Reports.
2. Palombo, D. J., Te, A. A., Checknita, K. J. & Madan, C. R. (2021). Exploring the Facets of Emotional Episodic Memory: Remembering “What,” “When,” and “Which”. Psychological Science, 32, 1104–1114.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Evaluation round #2

DOI or URL of the report:

Version of the report: 2

Author's Reply, 13 Mar 2024

Decision by ORCID_LOGO, posted 04 Mar 2024, validated 05 Mar 2024

Thank you for submitting this revised document. I was able to secure a review from one of the original reviewers, the other reviewer was unfortunately unavailable however I have also read the paper myself and feel I have enough to make a decision.

In short, I agree with the reviewer that this is a strong revision. The reviewer has raised only a few, fairly minor, points. I have one remaining concern about the power analyses: In the new manuscript, you now have two hypotheses for each RQ- an (a) hypothesis where you look at the effect of valence in each of the three groups separately and a (b) hypothesis where you will look for the interaction between valence and age. However, it looks like at present your power analyses only address power for (b) – the interaction. Can you also include analyses to check that you will have power for looking (a) i.e. test that you can detect effect of valance with the planned individual group size?

Also- and this is probably me being a bit pedantic- I would word the 1a hypotheses each as two hypotheses "[effect X holds] for younger adults" and "[effect X holds] for older adults" rather than"- [effect X holds] for both younger and older adults" (i.e. acknowledging the fact that one hypothesis could be confirmed and nor the other.)

I invite you to do one further revision addressing these points. Since I think these changes should be fairly minor, I don’t anticipate needing to send the paper for further review. Given this, I should be able to turn it round for you reasonably quickly once its submitted.

Reviewed by anonymous reviewer 1, 29 Feb 2024

The authors did a tremendous job revising their RR. This will be an interesting study.

I have only a few remaining comments that I hope will be helpful to the authors in sharpening some of their ideas:

1- Based on the authors’ response to one of my initial queries, it is now clearer that the authors are interested in using low arousal content (their decision to do so in relation to aging makes sense). Yet, I just want to caution that it is my understanding that some theoretical models in the literature pertain to arousal and not valence, e.g., ABC: Arousal-Biased-Competition model from Mather’s group. The authors do not explicitly reference ABC but the authors do cite “Mather, 2007, which is about “Emotional Arousal and Memory Binding.” I think the field is still trying to understand arousal versus valence effects (with some authors emphasizing arousal and others valence). Thus, I think it is crucial that the authors take great caution in applying theories of “emotion” to their paradigm if some of such theories are rooted in arousal, whereas their manipulation is chiefly one of valence. Indeed, you almost miss the low arousal decision in the methods and I might suggest making this decision more salient in the introduction, but if not, it would be good that this is discussed carefully in the discussion later. Thus, I am not suggesting a large change to the introduction but a subtle “pulling out” of this valence / arousal consideration or at least exercising a bit more caution so the reader is well aware of the nuances at play and caveats in relation to what aspects of the literature their own hypotheses stem from. Just to give one example (but there are certainly other places this can come up), the authors state: “In view of the low level of evidence in the literature regarding the combined effects of age and emotion on the memory for intrinsic item features and extrinsic item context, we are cautious in formulating our hypotheses about temporal memory and associative memory for context extrinsic to the items.” I might go farther and state “particularly for low arousal content.”

2- The authors made some improvements to the way they reference different types of temporal memory paradigms but they use the term “source memory” a little inconsistently: when discussing some of the paradigms “which list/session” [which to me are source memory paradigms], the authors do not consistently reference those as such, even though they use that term elsewhere. My read of their intro and/or the literature is that it is not so clear yet that “the beneficial effect of emotion on memory for temporal information seems to be robust when it concerns the moment when an event occurred” which is based on source memory findings, so I wonder if the authors want to (1) use more consistent terminology and (2) just soften this a little more. I understand that the authors believe the findings of Ceccato et al., 2022 (which did not observe an emotional memory effect in a source memory paradigm) might be due to the delay but for now this is speculation.

e.g., how about: “the beneficial effect of emotion on memory for temporal information seems to be robust when it concerns the moment when an event occurred, though this is based on a small number of studies with some exceptions”

3- If I may propose, the authors should refrain from using "he/she” but instead use either “he/she/they or another variant to highlight the diversity of gender and make the language more inclusive (n.b., I assume the authors plan to ensure gender is matched across young and old and may want to include that in the methods). Appologies for now pointing out earler. 

4- small typo: itemcontext should be “item context”

Evaluation round #1

DOI or URL of the report:

Version of the report: 1

Author's Reply, 11 Jan 2024

Decision by ORCID_LOGO, posted 09 Jun 2023, validated 09 Jun 2023

Thank you for submitting your work as a registered report to PCI RR. Your paper has received two reviews from experts in the area. Both are generally positive and consider the study overall well designed. However, they do each raise some (different) points about aspects of the design which you should consider. The first reviewer also has some important comments on the literature review, with places where you could be a little more nuanced in your discussion. Consideration of these points may also impact on design choices. 

I have some further comments/questions concerning analyses plan. First, I would like to note that the models are generally well described with a good level of detail, and the outcomes of analyses are specifically linked to the hypotheses, which is great. However, I would like to ask for a little more detail in places and to further probe the rational for the decisions you made, to make sure you have the best plan in place. (NB - I would like to acknowledge that I haven’t used GEE’s myself -I mostly use mixed effect models- however from a quick look at some online resources it seems like the fixed effect factors are set up in the same way as in lmer/glmer. I have therefore made suggestions in the same way as I would for those models, but do feel free to educate me if any of my points are misplaced in this context)

1)       Where you say you are going to have age and valance as factors in each model I think you should add “and their interaction”. In one of the models you also have target-type- how will this be coded? Will you include any interactions with this factor? If so, if any are significant how will you deal with them- will they be interpreted but treated as exploratory?

2)       For your inferences, will you be interpreting the specific coefficients that are output from the model which are relevant to each hypothesis? (I.e. using the “Robust z” output for that coefficient to get the p-value, rather than getting the p-value from model comparison of a model with and without the effect in question)? 

3)       You plan to use a quadratic contrast to see if neutral is worse than both positive and negative and a linear contrast to see if positive is worse than neutral which is worse than negative. I can see the logic here, but my understanding was that polynomial contrasts could only be applied to ordinal data with equally spaced levels (e.g. see Can you provide some further justification of this approach?

4)       Regardless of coding implementation, I also want to check that the two tests you have planned here (to be repeated three times each for each DV) are well designed in terms of the patterns that might turn up in your data. Take the first hypothesis: as stated, you are going to test whether the average of positive and negative leads to higher DV than neutral, averaged across both age-groups. The benefit of looking across age groups is that you have more power, but it does mean that you can’t necessarily conclude that the effect holds in each age group separately (and NB even if you had an NS interaction with age you couldn’t do that, as you can’t establish a null interaction from a p value). It could be that actually your effect only really holds in one group, but you still get a significant effect main effect across the groups. It also might be that you miss seeing the effect because the main effect is NS, but if you were to have looked at one of the groups in isolation, you would have found evidence that the effect was present in that group. An alternative would be to look at the effect separately in each age group (e.g. by running the model twice with age dummy coded but with a different reference level each time and inspecting the coefficient for the contrast in each case). That would seem to me to be more in line with your hypotheses as stated (i.e. that the effect is present “in both groups”). Of course, you would have to ensure power for this. 

5)       Power analyses: I appreciate why you base your power analyses on RM-ANOVA given the lack of equivalent software for GEE. However, I am not clear that you are computing power specifically for the equivalent tests. You have two hypotheses, one relating to a main effect and one an interaction. The required sample for 90% power for each of these will be different, however, you report one sample size requirement. I am not fully familiar with the software you are using, but what does it mean to say you have power “for the ANOVA”? Is this specifically the power for the interaction? (I presume you would have chosen that rather than the main effect as it will need more power?). But then would this be power for the omnibus F test for age by valence, as is typically reported for ANOVA? But you are actually interested in the power for the interaction of age by the linear contrast. Do we know that the omnibus F is a reasonable proxy here? Please consider these points and be specific about what you are doing and the justification.

6)       As one of the reviewers points out, when planning for sequential analyses you need to think about the fact that you have six hypotheses you are testing (I am assuming you won’t be looking at the two additional follow up analyses at the interim points) rather than one. From what I understand, you are only going to stop if all of your 6 tests are significant at the required alpha level. I further presume that if a particular test A is significant at the first “look”, but you have to keep going to all three time points (i.e. because other tests were not significant), you would nevertheless still eventually report A as in the final sample at time 3, and using the time 3 alpha boundary? This all seems sensible to me, but it would be good to see if there is precedent/discussion in the literature.

7)       A suggestion: not obligatory, but it could be useful to create your analyses script in R, possibility with a dataframe with some dummy data. It helps to make the analyses plan really clear (and its of course helpful for you later).

8)       Another suggestion: Even with high power, it is difficult to draw strong conclusions about any null results you might obtain. You could consider adding in either Bayes Factors of equivalency tests which can allow you to make inferences about null effects (Lakens, D., McLatchie, N., Isager, P. M., Scheel, A. M., & Dienes, Z. (2020). Improving inferences about null effects with Bayes factors and equivalence tests. The Journals of Gerontology: Series B, 75(1), 45-57.) Note that the former requires an estimate of effect size and the latter requires an estimate of smallest effect of interest so which you use may depend on which of these you find easiest to estimate. Again, this isn’t a requirement for stage 1 acceptance, just something you might like to consider.

One final point: Apologies if I somehow missed it, but I couldn’t see where you stated the age boundaries for your groups, which seems critical. 

Liz Wonnacott


Reviewed by , 28 May 2023

As outlined in this study preregistration, researchers will examine how positive vs. negative emotion modulate item memory, associative memory and temporal memory for younger vs. older adults. 


Participants will be shown short videos depicting everyday life activities with positive, negative and neutral pictures inserted into these videos to be shown for 2 s each at a random time. For each combination, participants will be asked to rate the compatibility between the video and the picture. After completing this phase and completing questionnaires during a 45-min delay, participants will be shown old and new images and tested on their item recognition. They will also be tested on their memory for where in each 20-s video the temporal placement of the picture occured, and their memory for which screenshot (from 5 different videos) was from the same video context as the one associated with the picture.


This is a well-designed study. My main concern was whether the temporal memory question was being tested appropriately given prior findings.


The authors previously discussed discrepancies between the findings from Ceccato et al. (2022) and Palumbo et al. (2018) and suggested that the 48-delay in one study but not the other might have contributed to the differences. A critical aspect of this long a delay is that it includes sleep. Jones, Schultz, Adams, Baran, & Spencer (2016) found that the emotional bias of sleep-dependent processing shifts from negative to positive in aging. Thus, increasing the planned delay from 45 min to a day or two, in order to include sleep in the delay period, may increase the likelihood of detecting age differences.


The current study also uses the Ceccato et al. (2022) results to motivate the investigation of the memory for temporal context. However, the Ceccato et al. study “exposed young and older adults to an experimental task in which they saw negative, neutral, and positive images in three sessions that occurred 48 hours apart.” This is quite different than the current study’s memory test where participants will be asked to indicate where in each 20-s short video context each picture was inserted. This is temporal context relative to each video, not relative to the participant’s sense of how long it has been since they viewed the pictures, which could have been the driving influence in Ceccato et al.


These prior results suggest that to more effectively test their hypotheses, the authors should consider including at least two encoding sessions that are each separated by overnight sleep.


The passage of time between different encoding contexts and the memory tests is relevant not only for the question about temporal memory, but also for the age-related positivity effect in item memory. “The second modification to this paradigm will be to increase the delay between encoding and retrieval phases from 10 min in the Palombo et al. (2021) paradigm to 45 min, as the age-related positivity effect in item memory has been shown to increase with time (e.g., Kalenzaga et al., 2016; Laulan et al., 2020).” It would of course be interesting to manipulate these time variables, but given the already many measures in the study, the key goal to focus on would be to select parameters that would increase power to detect the proposed outcomes. The prior research cited in the paper indicating that longer delays may increase the positivity effect and the Jones et al (2016) findings that sleep may differentially enhance negative memories in younger vs. older adults suggests that having 2 or more encoding sessions should not reduce power to detect the age-related positivity effect.

Mara Mather

(I sign all reviews.)

Reviewed by anonymous reviewer 1, 05 Jun 2023

In a registered report, Dr. Laulan et al. plan to explore different facets of emotional memory in younger and older adults. The study is modelled after a previous report by Palombo et al., (2021), with methodological differences clearly noted. This RR is a clear, mostly well written registered report and could make for an important contribution to the literature. Many of the methodological choices are well conceived and rigorous. However, I do have some major and minor reservations about the reporting of prior literature, the rationale, hypotheses, and the approaches used. I hope my comments are constructive. 


1.          The authors seem confident that emotion will impair associative memory even for positive stimuli. Is the literature so clear on this? Work by Christopher Madan shows that emotion can sometimes enhance associative memory for positive stimuli (e.g., Does such work inform the hypotheses here and if not, why not?

2.          I do not think enough granularity when describing Petrucci and Palombo (2021)’s conclusions; the author of the current RR state that: “Petrucci and Palombo (2021) explained that the majority of studies on the subject have found that emotion enhances the memory of temporal information.” But it seems that Petrucci and Palombo (2021) highlight that temporal memory is not ‘one thing’ and the effects of emotion on this form of remembering look different depending on whether one is considering e.g., item order, source memory, or duration. It seems there are a lot of mixed findings and the literature is too nascent to draw conclusions. The footnote in the RR is helpful but I do not believe it goes far enough. The same concern applies to the statement: “the beneficial effect of emotion on memory for temporal information appears to be robust (Petrucci & Palombo, 2021)”. Can the authors tone this down a bit?

3.          Elsewhere, it seems that the authors are trying to argue that when you parse different types of intrinsic features (e.g., temporal versus non temporal), one can better explain inconsistencies in the literature. But the evidence provided in the intro is pretty thin on that front. Related, at times the study dances around the idea of comparing the effects of agingXemotion on intrinsic versus extrinsic features of an experience but the current study design is not optimized for that particular comparison as it potentially conflates temporal versus non temporal memory with intrinsic versus extrinsic—if we see different patterns in the different subtasks, would we be able to conclude it is due to the “intrinsicisity” of the task? 

4.          More critical, the findings of Palombo et al., 2021 appear a bit more nuanced than the authors allude to in the introduction. In that paper, the authors examine the bias versus sensitivity of temporal memory and find that the effects of emotion on temporal memory are driven by bias, not by precision. Indeed, to quote Palombo et al: “Critically, we note that in our study, negative emotion did not affect precision per se, but it did affect participants’ responding; in the neutral condition, there was a shift to later temporal estimates. In other words, when participants made temporal judgments in the neutral condition, they tended to judge the events as having happened later. By contrast, in the negative condition, participants’ responses were not consistently biased to be either early or late. If timing was encoded as an intrinsic feature that was enhanced by negative emotion, we might have expected to see enhanced precision in the negative condition, but instead, we observed comparable precision for both conditions. It is not clear what mechanism can account for the present results.” How does this nuance inform the author’s current hypotheses? Is this the right paradigm, to test the authors’ hypotheses? It seems that the authors are interested in temporal precision?

5.          With respect to item memory effects, my comments are more minor but the introduction raised a number of questions for me: Does emotion only affect only discrimination of items or does it also affect bias? (See important work by Dougal & Rotello, 2007.) The authors state that the memory advantage for emotional v neutral is greater for negative than for positive stimuli. How much of this is due to insufficient matching of arousal in prior studies between positive and negative? The authors cite the roles of attention and consolidation. What about the role of semantic relatedness (a la Deborah Talmi’s work?). I realize that the introduction is quite long already but it would be nice to include some of this granularity in the introduction. 


1.          Do the authors have any exclusionary criteria for data quality that they want to report (e.g., if a participant misses X trials; does not move around the scale during encoding, shows signs of inattention or rushing; ceiling effects)?

2.          How do the authors factor in within subject power? For example, by introducing positive stimuli, the authors are using less trials per condition than I have seen in other studies.  How does this compare to e.g., Palombo et al., 2021? Can the authors discuss this?

3.          I suggest the authors unpack H1-3 with respect to hypothesis b (“There is an age-related positivity effect, such that…” to unburden the reader)

4.          For Leclerc & Kensinger (2011) description, please first describe the scales (ranges from X to Y)—for example, is 9 the highest possible score on the scale?

5.          The authors did a nice job with stimuli matching. But it would be helpful to show e.g., superimposed histogram/density style plots of the arousal and valence ratings in younger and older adults for the items chosen to demonstrate not just that the mean is matched but that the distributions are also matched across groups. The SDs give some sense but is not complete. The range would be helpful too. This is not often seen in the literature (unfortunately) but would bolster the notion of tight matching at the manipulation level or at least demonstrate how tight the matching really is. 

6.          The authors state, “Finally, valence and arousal were matched across the two age groups of our study for each category of images (i.e., negative, neutral, and positive images) (all ps > .05).” Might you want to be more conservative (e.g., .2 or no greater than x effect size)

7.          It seems that the authors strived for low arousal images —why? (“Also, in selecting our images, we sought to keep the arousal of negative and positive images as low as possible (see Waring & Kensinger, 2009”). Don’t other studies try to aim for high arousal, high valence? Sorry if I misunderstood something. 

8.          Did the authors match the categories of images (landscapes, animals etc across images) as is common in emotional memory studies. These categories are provided in NAPS but can be done for IAPS too. This will keep more of a distinctiveness of items within condition and avoid confusability in the memory tasks, (I am sure the authors are aware that this can be an issue in emotional memory studies as emotional stimuli tend to come from narrower semantic or perceptual themes (see Deborah Talmi’s work on this). 

9.          The authors state that “Each image will be displayed for 3 s (e.g., see Palumbo et al., 2018)”—> did the authors mean Palombo et al., 2021 instead? And do timings need to be adjusted in light of the inclusion of older folks? 

10.       Are there any concerns about administering “emotional” type questionnaires in the delay period? Could that affect early consolidation?

11.       How will the sequential analyses pertain to the different outcomes--are sequential analyses valid in this way when one needs to observe significant effects across the board?

12.       In the study design template table, the reference to the amygdala seems ill fitting and uses reverse interference. 

13.       Perhaps I missed this but will participants have a chance to practice the task first? I think practice is important given the inclusion of older adults. 

14.       I do not have strong expertise in GEE; I am familiar with the approach and the analyses seem reasonable but I flag this as a place of weakness in my expertise 

15.       Minor: Consider rephrasing “power deficits” as “insufficient power” 

User comments

No user comments yet