Understanding oscillatory correlates of pain expectation

based on reviews by Zoltan Dienes, Chris Chambers and Markus Ploner
A recommendation of:

Cue-based modulation of pain stimulus expectation: do ongoing oscillations reflect changes in pain perception?

Submission: posted 15 March 2023
Recommendation: posted 11 August 2023, validated 14 August 2023
Cite this recommendation as:
Learmonth , G. (2023) Understanding oscillatory correlates of pain expectation. Peer Community in Registered Reports, .


Recent studies using an EEG frequency tagging approach have reported modulations of alpha, beta and theta bands at the stimulation frequency during nociceptive/painful thermal stimulation compared to non-nociceptive/non-painful vibrotactile stimulation. Prior expectations of the intensity of upcoming painful stimuli are known to strongly modulate the subjective experience of those stimuli. Thus, modulating the expectation of pain should result in a change in the modulation of oscillations if these factors are indeed linked.
In this study, Leu, Glineur and Liberati will modulate expectations of pain (low or high intensity) prior to delivering thermal cutaneous stimulation (low, medium or high intensity). They will record how intense participants expect the pain to be, and how intense they felt it to be, as well as record EEG to assess oscillatory differences across the expectation and intensity conditions.
The Stage 1 manuscript was reviewed over 5 rounds by 3 reviewers. Based on detailed responses to the reviewers’ comments and edits to the Stage 1 report, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance.
URL to the preregistered Stage 1 protocol:
Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.
List of eligible PCI RR-friendly journals:
1. Leu, C., Glineur, E. & Liberati, G. (2023). Cue-based modulation of pain stimulus expectation: do ongoing oscillations reflect changes in pain perception? In principle acceptance of Version 5 by Peer Community in Registered Reports.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviewed by , 07 Aug 2023

I am happy with the authors' response.

Evaluation round #4

DOI or URL of the report:

Version of the report: 4

Author's Reply, 03 Aug 2023

Decision by , posted 01 Aug 2023, validated 01 Aug 2023

Dear authors, 

Please find attached the most recent comments from the reviewer. 

Best wishes,


Reviewed by , 28 Jul 2023

1) The justification for the relevance of the detectable effect size for each claim needs to be in the paper itself, not just in response to me;

2) The conclusions that follow need to be unambiguous. At the moment there is for some rows a statement about a definite conclusion given a non-significant result in the final column - but a note at the bottom disavowing the conclusion. Instead the conclusion in the final row should accurately reflect the actual conclusion legitimated by non-significance, e.g. no definite conclusion will be drawn.

3) Mixing inferential systems can lead to contradictions. A default BF only after a non-significant result has two problems. i) The BF may have produced evidence for H0 even if the frequentist test had been significant; ii) the prior - the model of H1 - has not been justified as relevant to the scientific problem; that meamns the BF has not been justified as relevant to the problem. The thing to do is pre-register inferences based on one system only.  Report default BFs if you wish as information for the reader, but do no pre-register conclusions that follow from the Bf; instead the conclusions follow from the frequentist statistics. And then if you report BFs for some tests, report BFs for all similar tests, so there is consistency.

4) post hoc power  (if this means basing power on the obtained effect size) simply reflects what the p value ended up being. So don't report post hoc power. Instead decide now what the power is, and define the possible set of inferences now based on the power determined now.

Evaluation round #3

DOI or URL of the report:

Version of the report: 3

Author's Reply, 20 Jul 2023

Decision by , posted 13 Jul 2023, validated 13 Jul 2023

Dear authors,

Please find attached the latest round of reviewer comments for your Stage 1 proposal. I would be grateful if you could respond to these remaining points at your earliest convenience. 

Best wishes,


Reviewed by , 12 Jul 2023

I appreciate the detailed reply. The conclusions do have to be justified appropriately by the results however; and at the moment they still do not line up.

1) The authors should indicate what effect size their N=40 does give them power to detect; then judge if a smaller effect would be theoretically interesting. If so, they should be clear in the Design Table that a non-signfiicant result does not indicate the theory has been shown wrong.

2) The authors still need to indicate what power they do have for each test  in the Design Table. They could proceed  in the same way; for their planned sample suze, indicate what effect they do have sufficient power to detect (for THAT test), and if smaller would still be interesting, appropriately indicate in the final column of the table that a non-significant result would not refute the claim tested - including for outcome neutral tests, this means toning down the existing language.

3) Bayes factors confront a similar issue: They are only meaningful tests of a theory, if the scale factor represents the sort of effect predicted by the theory. Note in this case, the most relevant aspect of the prediction of a theory is not the minimal meaningful effect, but the sort of effect predicted. The rough size of effect predicted is in general easier to justify scientifically than the minimal meaningful effect; but still there needs to be a justification. A default is just a suggestion to consider if e.g. a Cohen's d of 0.7 is actually relevant; it is not to be used without thought. Rather than mix inferential systems, however (in this case frequentist and Bayesian), the easiest thing to do here would be to rely on the frequentist stats for inference, so these are what appear in the Design Table. There would be no harm in reporting default BFs for information for the reader - then no justification of the prior (model of H1) is needed - but the authors stick to the rationale of hypothesis testing with power. In that case, if the study is not powered todetect small but interesting effects then this is simply recognized i nthe conclusions afforded by the analysis.

4) The alternative is go over to BFs for all tests as the system of inference, but this would involve thinking everything through again, which is why I say the simplest thing is to go with the power and N such as they are (and as the authors say they are good by field standards), but recognize the inferential consequences of this  in the Design Table.

In sum, what is needed is every conclusion is scientifically justified by the inferential procedure in every row of the Design Table in an explicit way.

Evaluation round #2

DOI or URL of the report:

Version of the report: V2

Author's Reply, 29 Jun 2023

Decision by , posted 22 Jun 2023, validated 22 Jun 2023

Dear authors,

Please find attached the latest round of reviewer comments for your Stage 1 proposal. I would be grateful if you could address the last few outstanding points regarding the power calculation. I look forward to receiving your response at your earliest convenience.

Please do get in touch if I can be of further assistance in the meantime.

Best wishes,


Reviewed by , 14 Jun 2023

The authors have convincingly addressed my comments. I wish them good luck with their study and I am looking forward to seeing the results.  

Reviewed by , 18 Jun 2023

The authors have spelt out more concretely - and usefully - a power calcualtion for the interaction. But my issue has not been addressed.

1) Text has been added about how small the N has been in previous studies to detect the effect; but this does not tell us in itself how large the N must be to infer no effect should the effect be non-significant. 

2) Power needs to be demonstrated for each test in the study design template, by a calculation for each test in itself, using the effect size for that specific test that one does not wish to miss out on.

3) The power that is calcualted is calculated for the size of interaction expected, not for the size that one does not want to miss out on.

Reviewed by , 09 Jun 2023

I'm satisfied with the authors' response to my points.

Evaluation round #1

DOI or URL of the report:

Version of the report: 1

Author's Reply, 02 Jun 2023

Decision by , posted 17 May 2023, validated 17 May 2023

Dear authors,

Please find attached detailed comments on your Stage 1 proposal from 3 reviewers. You can see that they are generally very positive, and the reviewers have highlighted a number of points that, when addressed, will strengthen your overall proposal.

Please get in touch if you have any queries in the meantime.

Best wishes,


Reviewed by , 06 May 2023

The proposed study aims to investigate the relationship between ongoing oscillations in the brain and the perception of pain. To this end, the authors propose a paradigm in which pain is modulated by changing noxious stimulus intensity and expectations of upcoming pain in 30 healthy human participants. Expectations will be modulated by presenting visual cues indicating upcoming pain intensity. 
The study is well-planned, and the manuscript is mostly clear and convincing. However, it might benefit from clarifications and added details:
1.     Framework.  The proposed study aims to investigate the relationship between ongoing oscillations and pain perception. To this end, they propose a cue-based expectation paradigm to modulate pain. However, there are numerous possibilities to modulate pain. They might explain why they will particularly use expectation to modulate pain. Moreover, they might consider recent EEG studies on expectation effects on pain (Bott et al., 2023; Strube et al., 2023).     
2.     Hypotheses. The authors should specify whether the hypotheses on the relationships between ongoing oscillations and pain perception are directed or undirected, i.e. do they expect positive or negative relationships for the different frequency bands?
3.     Participants. The authors should describe their sampling strategy and the inclusion and exclusion criteria in more detail. How will they recruit participants? Will they perform convenience sampling? How are gender and ethnicity accounted for?
4.     Experimental procedure. The paradigm should be specified in sufficient detail to replicate the findings. However, essential information is lacking. What will be the stimulation site? What will the latencies be between visual cue and expectation rating, expectation ratings and thermal stimulation, and between thermal stimulation, auditory cue, and pain intensity ratings? A figure detailing the paradigm might be helpful. 
5.     Behavioral measures. The rating scales and their anchoring should be detailed and explained. The rating scale assesses the intensity of general perception rather than the intensity of pain. As the study aims at investigating brain-pain relationships a rating scale assessing pain intensity might be more appropriate.     
6.     Specificity. Pain is associated with many different perceptual, cognitive, emotional, and physiological processes which are not specific to pain. Thus, relationships between pain and brain activity can equally well reflect other pain-associated but not pain-specific processes. Studies investigating brain-pain relationships therefore often contain control conditions. The authors might explain why the proposed study does not include a control condition such as non-painful thermal stimulation, as their previous studies did. 
7.     Blinding. The authors should specify whether the experimenters will be blinded during recordings and analyses.  
8.     Analysis. The procedure resulting in “aggregated amplitudes” should be specified in more detail.
9.     Negative findings. In the design table, questions 4 and 6 are most important. It is specified that negative findings would mean that ongoing oscillations might not be related to pain perception. This is a rather vague interpretation. The authors might think about clearer interpretations of negative findings. Using Bayesian rather than frequentist statistics might help with the interpretation of negative findings.
10.  Code/data sharing. The authors should specify whether they will share the data and the codes for stimulations and analyses. If they share, it should be specified where code and data will be available. If they do not share, this should be justified. 
11.  References. For some details, the authors refer to a study under review (Leu et al., 2023). As this information is not available so far, the authors should provide the details in the current manuscript rather than referring to unpublished manuscripts.   
12.  Errors. In the design table, the DV for the first question should likely be the expectation rating rather than the perceived intensity rating. On p.15, third paragraph, amplitude is likely the DV rather than the IV. 

Reviewed by , 29 Apr 2023

The paper proposes to investigate the neural oscillatory correlates of pain perception by creating conditions in which the physical stimulation is the same but cyclic changes in pain perception are different: The same medium stimulation creating the perception of relatively high vs low pain, based on expectation differences.

I did wonder if any result that did emerge would be a reflection of specifically pain perception rather than say a difference between intense vs weak sensation more generally, or more strongly attended vs somewhat less strongly attended stimuli. This is an issue the discussion could address in the final Stage 2 - unless there is a quick  answer that could be given in the introduction.

Not being an EEG expert I will comment on the statistics, and specifically the power calculation.
Power is one means by which a justification could be given for why a non-significant result should be taken seriously. Or to put it another way, if one is to use frequentist hypothesis testing, power needs to be calculated in such a way that a non-significant result could be taken seriously. The aim of power is to control the long term risk of missing an effect of interest.That is, one should ensure power is calculated with respect to any effect that could be of interest. That is, it should be calculated with respect a roughly minimal interesting effect size. Thus, PCI RR guidelines say "power analysis should be based on the lowest available or meaningful estimate of the effect size." Some thoughts here may be useful:
As far as I can make out, the authors used a value for relevant parameters based on a past paper. BTW the authors do not provide enough information to reproduce their calculations - please provide exact numbers with justification why they were chosen in particular. The value obtained in a past paper does not define the value that one is prepared to miss out on. Presumably an effect half the size found previously would still be of theoretic interest - and one wouldn;t want to miss out on it.
Power must be calculated for each test in the Design Template separately, with due sensitivity to the nature of that DV.
Take the predicted effect of perceived pain on the modulation of neural oscillations. The extent of the modulation must depend on the extent of the perceived pain difference. There are data that indicate what the modulation is estimated to be for a particular known pain perception difference, based on past work; in the simplest case of one such study, one could draw a line from that point of oscillation modulation vs pain difference to (0,0). The authors have from their pilot an estimate of the pain difference they are likely to obtain. So the degree of modulation, in raw units, can be estimated for the pain difference they are likely to obtain.  But what we want is the roughly smallest possible difference. So put an 80% CI on the estimated modulation in the first step from a past paper, and repeat the procedure, drawing a line from there to (0,0).
Numbers of trials will affect the population by-participants Cohen's d. Make sure when using past studies one takes into account any difference in quantity of data used in the previous and current study (durations over which data are collected, number of trials).
I know this is FAR more work than is usually done in non-RRs. But RRs are an opportunity to tighten up on our scientific inference, so we have a chain of inference that actually holds together, at least roughly.

p 14  "a right tailed  multi-sensor  cluster-based  permutation  test  using  Wilcoxon  signed-rank  test  as test statistic will be used"
Can a reference be given for why this controls for multiple testing?
Also describe or give a reference for the exact procedure.

"taking  potential  type  II  error  inflation  due  to  multiple  testing  into  account."
Did you mean Type I?

"A separate LMM is calculated for the amplitude at the FOI in each frequency band."

How will familywise error be controlled?
Note: Power must be determined given the family wise error correction used.

"normality and linearity will be assessed visually" 
As this will be done once the data are collected, it allows analytic flexibility - choices could be made based on the p-values obtained. Could a blind analysis procedure be used? (That is, the condition labelling is removed or scrambled and the data with IV information removed given to an analyst just to make this decision).

How often do the MLMs fail to converge with this sort of data? Make sure there is no analytic flexibility left over here: Describe how convergence will be ensured without analytic flexibility.

Reviewed by , 15 May 2023

This is a promising proposal to investigate the causal relationship between pain perception and neural oscillations. I particularly appreciated the interventional nature of the design, which stands in contrast to the predominance of studies that focus on correlations between behaviour and oscillations.

The method is generally strong (with appropriate inclusion of pilot data to validate the primary methodology). I have a few comments/suggestions for the authors to consider in revision:

1. The design is tight but I wonder about the issue of functional specificity and, in particular, whether factors other than pain perception could explain any observed modulation of oscillatory activity. For instance, could any change in oscillatory activity between LM vs HM reflect greater attention to the stimulus in the HM condition rather than greater perception of pain? One way to address this would be to insert some kind of additional stimulus into the pain-eliciting stimulus on 50% of trials (such as a a short temporal gap or other transient) and include an attentional control task that, on some trials, requires the participant to decide whether the transient is present or absent (rather than making a pain judgement). By titrating the transient to a threshold level of detectability, you could determine whether the cue alters detection sensitivity, and thus whether attentional effects are likely to be mixed in with pain perception. If you then found evidence of no effect of the cue on detection sensitivity it would strengthen the causal link between oscillatory changes and pain perception, independently of attention. I suggest this merely as an option for the authors to consider at a conceptual level rather than a concrete design change (as the authors may have better ideas, or there may be valid reasons to discount this issue). Any changes to the design would require careful piloting.

2. There are various points where additional methodological detail is needed to ensure that the methods are computationally reproducible and close of potential (inadvertent) researcher degrees of freedom.

(a) Independent Component Analysis (Fast ICA algorithm) -- please specify in advance all parameters for this analysis, and for all other EEG analysis steps that refer to general procedures. A Stage 1 RR must be computationally reproducible even when it refers to previous methods. Ideally an analysis script should be included as part of the submission.

(b) Outlier exclusion: visual exclusion is bias-prone unless done very rigorously using blinded analysts. Can it not be done using objective criteria?

"Any data point that will still violate normality or linearity after the transformation or disproportionately affects the dataset after fitting the LMM will be removed from the data set and will not be replaced." Does this apply to data within participants? If so, how much of a dataset must be lost before the participant is excluded? Presumbly excluded participants will be replaced to ensure that the minimum sample size is met?

"Additionally, data points that over-proportionally influence the data set will be identified using Cook’s Distance [D]. This method calculates how much the fitted values of a given data set change if just one data point is removed." Can the authors specify within which cells of the design these tests will be applied? Outlier exclusion can be applied in many different ways (collapsing across conditions or within the most specific cells -- please be specific)

3. Minor points

p11: "These results prove the effectiveness of the chosen paradigm to change the subjective intensity perception of the applied stimuli towards the presented cue." Suggest replacing "prove" with "confirm".

If the authors are able to increase the sample size to achieve power of 0.9 (rather than 0.8), it would open up the possibility of PLOS Biology being interested in this article as a PCI RR-interested journal (since they set a minimum 0.9 power requirement). In addition, if they increase power to 0.9 and decrease alpha to .02, it will release Cortex as a PCI RR-friendly outlet (see details here). I mention this for information only, as I appreciate the authors may face resource restrictions that prevent the necessary increase in sample size that would be required in each case, and the a priori evidence strength is (in my view) otherwise sufficient for PCI RR.

User comments

No user comments yet