DOI or URL of the report: https://osf.io/y2zxn?view_only=9384332360894f718ae1ba304b5e1c5d
Version of the report: 2 (PCIRR-Stage1-Revised-Manuscript-Instructed_Extinction.docx)
Your revised manuscript has now been evaluated by the three reviewers from the previous round. The good news is that all are broadly satisfied and we are close to being able to award Stage 1 in-principle acceptance (IPA). As you will see, there are some remaining issues to address concerning clarification of the rationale/predictions and ROI definitions/labelling. Provided you are able to address these comprehensively in a final revision, I anticipate being able to award IPA without further in-depth Stage 1 review.
The authors have done a very careful and thorough job in addressing the reviewer's comments (mine but also those of the other 2 reviewers, as far as I can judge) and revising the manuscript. I do not see many remaining issues. One thing that still stands out, though, is the ambivalence regarding what the effect of instructions should be expected to be for the extinction of aversive versus appetitive learning. This is evident already in the abstract: the authors write initially that 'In a therapeutic context, this asymmetry [i.e., aversive learning being slower to extinguish than appetitive learning] that has been discussed as indicative of a ‘better safe than sorry strategy’ could potentially be overcome by making patients aware of the change in contingency' [implying that instructions should make extinction of aversive cues more like extinction of appetitive cues]. Yet a few lines down, they state that 'We expect [that the] effect [of instruction] is more pronounced for appetitive stimuli [than for aversive stimuli], which implies that instructions would further enhance the asymmetry between extinction of appetitive and aversive cues in the context of pain. It is also the latter prediction that they subscribe to in their response to point 3 of my initial review. Here they refer to the same 'better safe than sorry strategy' that is supposed to yield the initial asymmetry between extinction of appetitive versus aversive cues as the reason to expect a lesser effect of instruction, whereas the opening sentences of the abstract suggest that instructions are expected to counter exactly the effects of that strategy.
This ambivalence regarding the anticipated effect of instructions shines through elsewhere also. E.g., on p. 4, line 6-9, it is again suggested that instructions may be a promising method to prevent the incomplete extinction of aversive associations (implying that it would bring aversive extinciton more in line with appetitive extinction]. This conceptual confusion deserves wrinkling out, I think.
Other than that one issue, however, I think that the authors have done an excellent job and have dealt with all the comments in a satisfactory way. I would be happy to see this registered report advance to stage 2.
Tom Beckers
After going through the revised manuscript and author replies in detail, I am happy with how the authors have responded to all of the reviewer comments and revised the manuscript accordingly. I commend the authors for their thorough work.
The only remaining point I have pertains to the definitions of the regions-of-interest (ROIs) for the fMRI analyses, which I unfortunately overlooked in my first review - my apologies.
The manuscript states (p. 30, line 31-33 on tracked changes version): "SBFC analyses will use the left and right dlPFC and vmPFC, amygdala, and striatum as seeds, with masks derived from the FSL Harvard-Oxford Atlas...".
However, dlPFC, vmPFC, or striatum do not exist as regions in the Harvard-Oxford Atlas (amygdala does). I would ask the authors to precisely define the regions from the Harvard-Oxford Atlas (or another atlas) that will be used to construct the specified ROIs, or to offer an alternative definition of the ROIs and a description of how the masks will be built.
As a side note of this point, striatum is mentioned in the manuscript for the first time as "ventral striatum" but all other mentions are to "striatum" only, so I am not certain if this means both ventral and dorsal striatum together, or is only used as a short form for ventral striatum - please clarify. Based on the extensive evidence that ventral and dorsal striatum have different functions, shown also in many fMRI conditioning/learning studies, it would make sense to differentiate them and (re)consider whether to include e.g. only ventral striatum, or both.
I'm satisfied with the authors' replies to my comments and I look forward to seeing the results of this RR.
DOI or URL of the report: https://osf.io/spjh9?view_only=2ec6e1c7b79a4b45b53d33b4415cc7e2
Version of the report: 1 (PCIRR-Stage1-Snapshot-Instructed_Extinction.pdf)
I have now obtained three very helpful and detailed expert reviews of your submission. As you will see, the reviewers find much to like in your proposal, which I agree is broadly very rigorous and tackles a valid and interesting research question. With RRs, the devil is always in the detail, and there are a number of areas requiring careful attention to satisfy the Stage 1 criteria -- including clarification and strengthening of the rationale (especially for the fMRI experiment but to varying extents throughout both planned studies), additional methodological detail in key areas to ensure reproduciblity and eliminate potential sources of bias (e.g. in relation to exclusion criteria, sampling plans, and analysis plans), and justification of specific design decisions.
Although there is some significant work to be done, the reviews are very constructive and I believe your manuscript is a promising candidate for eventual Stage 1 in-principle acceptance. On this basis, I would like to invite a major revision and point-by-point response to all comments, which I will return to the reviewers for another look.
I have read this stage 1 proposal with great interest. It addresses an interesting research question of basic and translational interest with overall sophisticated and appropriate methods. Whatever the study's results, they should be of interest to the field. However, whereas the overall rationale for the study is sound and convincing, quite a number of specific aspects of the design, procedures, hypotheses and analyses of the proposed study are less well justified or elaborated. As such, I think that there are a number of issues throughout the manuscript that merit addressing prior to the start of data collection. I list them below in order of appearance in the manuscript:
1. The description of the prior study from which the current proposal takes inspiration is rather confusing to me (p. 3). First, on line 12 and beyond, I think CS and US have been switched; the increase and decrease in pain are USs, not CSs. More importantly, the description of the results of that study (line 17-21) on the one hand suggests that changes in differential CS valence ratings over the course of extinction training were similar for appetitive and aversive CSs (lines 17-19) but at the same time states that extinction of aversive CS valence ratings was incomplete (with an unstated implication that is was complete for appetitive CS valence; lines 19-21). It would be good to clarify this.
2. Further down the introduction (p. 4), it is stated that Sevenster et al. (2012) showed that instructed extinction immediately abolished differential US expectancies but left SCR to the CS+ unaffected (lines 5-7). While factually correct, this is a bit misleading, given that differential SCR was completely absent from the first extinction trial in the instructed extinction group (see Sevenster et al., 2012, Figure 4). More broadly, I think there is little evidence in the literature to support the claim that instructed extinction is less effective for SCR than for US expectancies (in fact, even the claim by Sevenster et al., 2012, that it is less effective for fear-potentiated startle has been disputed).
3. I don’t fully understand what the rationale is for predicting a stronger effect of instructions on extinction of the appetitive than the aversive CS (e.g., p. 5, lines 18-19, but also elsewhere). Given that without instructions, extinction is expected to be weaker/slower for the aversive CS, one would think that there is more room for instructions to facilitate extinction for that cue. The authors seem to be ambivalent about this as well, because further down in the manuscript, they make different predictions in this regard for US expectancies (see H4c on p. 24) than for CS valence (see H4d on p. 25), without further justification or discussion. I think this needs straightening out, given that testing this specific interaction between US type and instruction is the core raison d’etre for the proposed study’s factorial design. In that sense, it is also a bit strange to formulate this analysis as being exploratory; the authors are clearly intending to do confirmatory analyses to test the presence of an interaction and undoubtedly hope to draw directional conclusions from the results regarding the presence or absence of a difference in the effect of instructed extinction on appetitive versus aversive learning.
4. I think the introduction in its present form does not provide sufficient justification for the inclusion of the various dependent variables that will be measured. The authors do state that the US expectancy ratings will be the primary measure of interest, but what are the valence ratings, SCRs and pupil dilation responses each supposed to add? Why are they included? And how will the authors handle divergence in results between these measures? Justification for the inclusion of SCR and pupil dilation in particular is not trivial. Is there good evidence that SCR and pupil dilation are appropriate measures here, in a design that involves a salient appetitive as well as a salient aversive US? Particularly given the nature of the proposed procedure, where trials start with a pre-US situation of moderate pain: The fact that the appetitive CS no longer signals a reduction in pain during extinction might be perceived as an aversive outcome in the appetitive group, which could support an increase rather than a decrease in SCR to the appetitive CS during extinction, which would not actually reflect a lack of learning. The same may be true for pupil dilation. This may all hinder interpretation of possible differences between the appetitive and the aversive CS during extinction. At a minimum, this warrants some justification/consideration. None of these issues would seem to plague the US expectancy ratings, for which a direct comparison of responding to the appetitive and the aversive CS seems much more straightforward.
5. Relatedly, no clear justification is provided for including only US expectancy as a measure of conditioning in the analyses for the second manuscript.
6. Regarding the sample size and stopping rule, two issues warrant elaboration. First, I think it would be more appropriate to stop data collection if for all hypotheses a BF10 of either 6 or 1/6 is reached, rather than a BF > 6 for all hypotheses, so as to not bias data collection towards positive results. Second, it isn’t clear how the authors, starting from the effect size in Sevenster et al. (2012), arrived at their intended sample size of 150 (p. 14). This deserves elaboration.
7. Reinstatement of conditioned responding (or recovery of extinguished responding in general) isn’t really covered in the introduction; as a result, the inclusion of a reinstatement phase in the experiment feels like it is lacking a clear rationale.
8. The authors list an appropriately ordered difference in US painfulness ratings as manipulation check. Fair as this may be, I think a more relevant positive control would be the observation of differential acquisition for all measures and for both types of US by the end of acquisition.
9. The US is variably described as an increase/decrease/constant temperature, or as a pain exacerbation / pain decrease / no change in pain. One is a way to achieve the second, obviously, but it would help clarity if the authors used a consistent terminology for what the USs are throughout.
10. Participants are excluded if they have recently participated in pharmacological studies (p. 14), but can they have taken part in conditioning/extinction experiments before? It seems like that might affect their speed of acquisition and extinction learning rather substantially (cf. the literature on re-acquisition and re-extinction).
11. It is a bit odd that the label for the US expectancy scale reads ‘most probably cooling decrease’ on the one side and ‘most probably heating’ on the other side (p. 16, line 11-12). Why not just ‘most probably cooling’ and ‘most probably heating’ (or, alternatively, ‘most probably cooling decrease’ and ‘most probably heating increase’)?
12. On p. 17, covariates are introduced that haven’t been mentioned previously. Will these questionnaire scores be used for screening out participants only, or will they also be used for additional analyses on included participants?
13. I found the description of the calibration procedure on p. 20 difficult to follow. In particular the follow sentence I failed to make sense of (lines 2-3): “This second step will consist of the application of the selected temperature level within a range of -1.5 to and +3.0°C in steps of 0.5°C”.
14. Regarding the trial structure: the intertrial interval of 4-7 s seems unlikely to be sufficient for unconditioned skin conductance responses to the US to return to baseline. An longer interval seems indicated.
15. Why does the extinction phase involve a slightly smaller number of trials per CS than the acquisition phase? This also translates in different numbers of trials in between US expectancy ratings, for instance, and more generally makes the analyses a bit difficult to compare between acquisition and extinction. Why not simply have 16 trials per CS in each phase?
16. Regarding the analyses, I found the use of ‘time’ for the independent variable of trial a little unfortunate, given that also ‘time bin’ is used as an IV in some analyses, and time in the latter case refers to within-trial time, whereas in the former case it refers to a much larger scale. ‘Trial’ would probably be a clearer descriptor than ‘time’.
17. SCR will be divided in FIR and SIR. It would be good to provide a rationale for this (or, alternatively, to not distinguish between FIR and SIR).
18. Regarding the second-level connectivity analyses (p. 28), I would have expected that connectivity would be used as a predictor and acquisition and extinction indices as outcomes (lines 23-25).
19. Regarding the same analyses, it isn’t clear whether also hypotheses regarding acquisition will be tested – the title on line 16 does suggest so, but on line 31-32, only hypotheses regarding extinction are mentioned.
20. On the next page, to test the direct effect of instruction, US expectancy immediately before and after the instruction will be compared for the instructed group only (lines 5-7). This seems a bit odd. Wouldn’t it make more sense to compare the difference in US expectancy from Acq5 to Ext1 between the instructed and the non-instructed group? Likewise, to evaluate the effect of instruction on learning through experience, it would seem more sensible to compare the difference in US expectancy from the end of acquisition to the end of extinction between both groups. Confusingly, further down (lines 19-23), the proposed test of the hypothesis does involve an interaction with instruction group, in contradiction to the preceding section.
21. Regarding the pilot study (p. 30), the authors indicate that one participant was excluded due to poor data quality. It would be good to know how poor data quality is defined, given that this may happen for the proposed study is well and should perhaps be mentioned as ground for exclusion.
22. While discussing the results of the pilot study, the authors indicate on p. 31 that, whereas the mean expectancy for the CSdecrease returned to zero the mean expectancy for the CSincrease remained elevated at the end of extinction. I’m not certain that that is clear from Figure 2, in particular when considered relative to responding to CSmedium.
Tom Beckers
I’ve read the RR1 manuscript titled “Modulatory effects of instructions on extinction efficacy in appetitive and aversive learning: A registered report” by dr. Busch and colleagues carefully. In the RR proposal, the authors want to investigate the effects of verbal instructions on aversive and appetitive conditioning, using heat pain (or pain relief) stimulation. For several outcome measures (i.e., CS expectancy ratings, CS valence ratings, pupil dilation and skin conductance responses) the effects of instructed extinction and aversive vs. appetitive conditioning will be assessed.
All in all, I think this is a good research proposal on a topic that, due to the methods involved (e.g., psychophysiological measures), usually suffers from restricted sample sizes. Therefore, I believe it is a good thing that a well-powered RR will be conducted on this topic. Furthermore, the report is well-written and the authors seem experts on the involved methods. As such, I do not have many things to add.
My only more major comment is related to the different dependent variables (i.e., CS expectancy ratings, CS valence ratings, pupil dilation and skin conductance responses) and how differences in results between these DVs should be interpreted. What if the CS type X time interaction during acquisition or extinction is significant for one DV, but not for the authors. How should the hypothesis then be interpreted? I think that, in principle, the different DVs test the same hypothesis (e.g., steeper extinction slopes for appetitive than for aversive CSs). Therefore, I believe that correction for multiple testing should probably be applied. In that case, if a significant effect (using an adjusted alpha-level) is observed for any of the DVs, this effect can be interpreted as supporting the hypothesis.
Smaller comments:
- Title: I struggle a bit with the terminology, because typically when considering appetitive conditioning, I think of things like pairing CSs with chocolate or erotic pictures (van den Akker et al., 2017). What the authors do in their paradigm seems more akin to relief learning (i.e., relief from a painful stimulus). I am not entirely sure whether this is the same thing as appetitive conditioning. However, I do not have good recommendations for the authors to change their terminology (except for maybe “pain” and “pain relief” learning).
- P. 4: I believe that Sevenster et al. (2012) observed a lack of effects of instructions on the startle response, rather than skin conductance responses. And even this interpretation is somewhat dubious, because in their follow-up tests, Sevenster et al. (2012) observed facilitated extinction in the instructed extinction group with startle as well. Indeed, the literature indicates mostly ubiquitous effects of verbal instructions with different DVs (Atlas & Phelps, 2018; Costa et al., 2015; Mertens et al., 2018; Mertens & De Houwer, 2016).
- Fig. 1: Perhaps this figure can be a bit reorganized (particularly for any eventual publications) to make the size of the left panel figure larger (e.g., by putting the two images below one another, rather than next to each other).
- P. 23: Regarding the covariate analyses including US painfulness ratings, gender, age, etc. Are these really needed? Perhaps in an exploratory sense, they could be interesting. However, adding them to the main analyses based on model improvement seems not needed and could complicated the interpretation of the results in my view. Particularly, due to randomization, any systematic effects of these covariates should be nullified. Furthermore, when effects are reported including the covariates, it may be hard to gauge for readers whether effects crucially depend on the inclusion of the covariates. Hence, for ease of interpretation, I would recommend to simply not include these covariates in the main analyses.
I am not an expert in fMRI analyses, so unfortunately, I could not really evaluate the appropriateness of the analyses. However, at first sight, I think the analyses seem appropriate and I believe that the preregistration of the analyses pipeline in this RR is very valuable, given the many degrees of freedom in analyzing fMRI datasets (Botvinik-Nezer et al., 2020).
References
Atlas, L. Y., & Phelps, E. A. (2018). Prepared stimuli enhance aversive learning without weakening the impact of verbal instructions. Learning & Memory, 25(2), 100–104. https://doi.org/10.1101/lm.046359.117
Botvinik-Nezer, R., Holzmeister, F., Camerer, C. F., Dreber, A., Huber, J., Johannesson, M., Kirchler, M., Iwanir, R., Mumford, J. A., Adcock, R. A., Avesani, P., Baczkowski, B. M., Bajracharya, A., Bakst, L., Ball, S., Barilari, M., Bault, N., Beaton, D., Beitner, J., … Schonberg, T. (2020). Variability in the analysis of a single neuroimaging dataset by many teams. Nature, 582(7810), 84–88. https://doi.org/10.1038/s41586-020-2314-9
Costa, V. D., Bradley, M. M., & Lang, P. J. (2015). From threat to safety: Instructed reversal of defensive reactions. Psychophysiology, 52(3), 325–332. https://doi.org/10.1111/psyp.12359
Mertens, G., Boddez, Y., Sevenster, D., Engelhard, I. M., & De Houwer, J. (2018). A review on the effects of verbal instructions in human fear conditioning: Empirical findings, theoretical considerations, and future directions. Biological Psychology, 137, 49–64. https://doi.org/10.1016/j.biopsycho.2018.07.002
Mertens, G., & De Houwer, J. (2016). Potentiation of the startle reflex is in line with contingency reversal instructions rather than the conditioning history. Biological Psychology, 113, 91–99. https://doi.org/10.1016/j.biopsycho.2015.11.014
van den Akker, K., Schyns, G., & Jansen, A. (2017). Altered appetitive conditioning in overweight and obese women. Behaviour Research and Therapy, 99, 78–88. https://doi.org/10.1016/j.brat.2017.09.006
This registered report submission outlines a study to answer the research question of whether verbal instruction may make extinction more efficient during appetitive (pain relief) than aversive (pain exacerbation) learning, additionally investigating associations of learning and extinction indices with pre-task resting-state fMRI connectivity between regions-of-interest and the rest of the brain. Majority of the submission is very clearly written and extremely thorough. In most aspects, it is an excellent registered report for a well-planned study. I am satisfied with the behavioral and psychophysiological section of the submission and would accept the submission almost as-is regarding the parts intended for Manuscript 1, with only minor points to address. However, I have some central criticisms pertaining to the research questions, theoretical background and hypotheses posed in the resting-state fMRI section intended for Manuscript 2, which I think should be resolved before acceptance.
1A. The scientific validity of the research question(s)
1B. The logic, rationale, and plausibility of the proposed hypotheses (where a submission proposes hypotheses)
1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable)
1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses
1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).