DOI or URL of the report: https://osf.io/knpmf
Version of the report: 6
Dear authors
Thank you for submitting the revision and the clarifications of my questions. I see the effect size in the design table has been corrected and after reading your explanation of the illusion scale I think this part can probably stay as it is (although adding similar clarification to the manuscript probably wouldn't hurt either).
However, as already explained in direct correspondence when it comes to the statistics we still have a problem. There is no easy way to reopen the submission of this round so I must ask for another official resubmission.
As I wrote previously, it is crucial for a RR Stage 1 to lay out the hypothesis carefully and define the approach used to test it. And as per guidelines all hypotheses require a power analysis. This is to ensure that the risk of obtaining inconclusive findings is minimised. When you work out the power only for the omnibus test, you clearly risk the possibility of non-significant findings in your posthoc tests, especially if there is a large number of pairwise comparisons.
Practically what this means in your case is that Hypothesis 1 must be powered for the individual pairwise comparisons at alpha=0.0125. The test as it stands is for an unspecified difference between the four experimental conditions. But the hypothesis you defined is whether there is a significant difference between MS and NI as well as NIT conditions. To achieve that you need to provide the power of your posthoc tests. Since this should be a strong effect this is unlikely to affect the overall required sample size. There are other solutions (changing the hypothesis to main effect, using two-way ANOVA or planned comparisons) but what I suggested here is the simplest change.
I see you have included power for the posthoc tests in Hypotheses 2 now. But now these are no longer Bonferroni corrected. Given that these are the specific contrasts of interest you want to ensure that your experiment is sensitive enough to detect these differences but also provides adequate control for false positives. In this context, it isn't clear why you now use 80% power for these tests but 90% for the ANOVAs. Also, why does the necessary sample size differ between the three tests when they all use the same parameters?
Please note that many RRs use simple preplanned pairwise comparisons for their hypotheses, without omnibus tests. This is an acceptable solution and actually the sensible thing to do here. Looking back at the review history we discussed this previously. Although this would be a substantial change from your current design, which I'd not be very comfortable to make at this point without sending it back out for review to be honest, especially since the ANOVA was added after comments by reviewers. But you could drop the ANOVAs, and specify the necessary pairwise tests, with strict correction for multiple comparisons. (The ANOVAs and any exploration of main effects could still be added as exploratory analyses in Stage 2).
Whatever you do, in your revision please explain clearly how you addressed this issue. As I said, please contact me in advance before submitting. I don't want to make this more difficult than it needs to be but I am sure you will understand we cannot have an infinite number of revisions. Let's make sure this next submission is the definitely the final one.
Best wishes
Sam Schwarzkopf
Recommender's Post-script
After rereading the comments I have sent you, I figured out the reason for the different sample sizes in your pairwise t-tests for Hypothesis 2. In my previous reading I missed that you are using a one-tailed test for 2C but two-tailed tests for 2A and 2B. I apologise for this oversight - this was an attentional lapse on my part although this also arose from the description of your hypotheses. (Personally this is why I prefer less verbose Design Tables that simply state the comparisons - but I believe I'm probably in a minority on that point).
Anyways, this is yet another example of why the planned statistics must match the hypothesis they are supposed to test. It is not clear why you decided to use a one-tailed test for this comparison. Specifically, the hypothesis reads:
"There will no significant difference in SSEP response across the electrodes of interest (F1 & FC1) when comparing the NIT condition to the NI condition."
This is not a one-tailed comparison. The appropriate hypothesis here would be "The SSEP response across electrodes of interest (F1 & FC1) will be larger for the NIT than the NI condition." Of course it could also be the other way around - it would be crucial to define the direction. But critically, you could have a pronounced difference in the opposite direction, which would then be a non-significant one-tailed effect! It is not clear why you would posit a directional effect here, especially considering that you are hypothesising there is no difference between these two control conditions.
This brings us to another issue, which is that frequentists statistics cannot support the null hypothesis directly. Personally I find the best way to deal with this is using Bayesian tests although you could use other approaches (predefined confidence interval, equivalence test). But I wouldn't suggest adding this at this stage, as again this should then really be sent back out to reviewers. Instead, I would suggest following the advice I already gave and use stringent control of false positives for your posthoc tests. This ensures you have adequate power for comparing MS and UV, respectively, to NI. But keep in mind that if you use the same effect size for Hypothesis 2C (Cohen's dz=0.5) you will not have sufficient power to detect smaller differences.
Therefore you should ask what is the minimum difference between NI and NIT that you would consider as evidence against your hypothesis. You may need to adjust this which would require a larger sample size.
I hope all this makes sense and we can get this study over the line to IPA soon. I'm sure you are eager to commence this study!
DOI or URL of the report: https://osf.io/8952c
Version of the report: 5
Dear authors
Thank you for submitting your revision of the RR manuscript and addressing all the reviewer's comments. I have decided not to send the manuscript out for review again. However, there are still a number of critical issues that must be resolved before we can recommend this study for in-principle acceptance:
The normalised (baseline corrected) data will be used for analyses, with a new scale from -100 to +100 with 100 indicating strongly agree, 50 indicating a neutral opinion, and scores below 0 indicating strongly disagree with the statements on the questionnaire.
Wouldn't the neutral point be 0 in this case or am I misunderstanding how this is calculated?
I am very sorry for this protracted review process but this is being done for a reason. As always, please contact me prior to resubmission if anything is unclear or you want to double check the changes you plan to make. Please note that this will be the final round of review at which point I will either recommend or reject the manuscript.
Best wishes
Sam
DOI or URL of the report: https://osf.io/bjxsy
Version of the report: 4
Dear authors
Thank you very much for submitting this revised version of the manuscript. I appreciate you taking the reviewers' concerns and my own comments to heart and that you have worked very hard to revised the study. By focusing only on the control participants in this submission, you have simplified matters considerably. This should pay off in the future if you submit a follow-up study using chronic pain patients.
The current manuscript has been reviewed by one of the previous reviewers, who still raises a list of substantial issues. I would therefore ask you to revise the manuscript one more time before it can be ready for in-principle acceptance (IPA). I realise this may seem frustrating (I am sure it is also for reviewers, which is perhaps why we only have one review now). However, Stage 1 RRs are different from conventional papers in that they effectively set in stone the bulk of your study. While it will always be possible to fix small issues and add exploratory aspects at Stage 2, it is important that any methodological flexibility in minimised at this point. Therefore, correcting imprecise language and other small details may seem pendantic but this is important - such issues could result in ambiguity at Stage 2. The same goes for small errors and even typos if they pertain to the proecedures used.
That said, handling the review process for RRs is also different in that it may not always be possible to reach a perfect consensus between authors and reviewers. Therefore, the recommender will sometimes have to make a judgement call and possibly overrule a reviewer (of course, the same applies to conventional papers; with RRs this only becomes more pertinent). I appreciate that some of the reviewer's comments can probably be considered matters of taste. In some cases, I also disagree. However, the reviewer has provided a very thorough evaluation of the manuscript thus far, and spotted several important issues that I missed. For the most part these should be addressed. I will not send this manuscript out for another full review. I will review any minor corrections myself, but I may ask the reviewer to briefly comment on changes in response to her major points. Therefore, please do prepare a point-by-point response as usual.
To facilitate this process, especially for points where you are unsure how to proceed, please feel free to contact me directly to discuss, as you already did before submitting this version. This way, we should be able to ensure that the next submitted version is ready for IPA.Also, to assist with the revisions, I will now briefly list my views on each of the major points:
1. Missing focus on “finger/hand” resizing (intro): I disgaree, this seems largely a matter of taste to me. The final intro paragraph clearly communicates the aim of the study. If you agree with the reviewer that some references to previous studies could be clearer, especially w.r.t. which perceptual phenomenon they address, then by all means clarify this but in the end this is your paper
2. Missing references: Essential to fix! Thanks to the reviewer for spotting this.
3. Nonsignificant difference in pilot data: I assume what you were trying to convey here is that despite the difference between synch. and asynch. not being significant, there is a numerical suggestion of that effect? It is a tall order to expect significant differences in pilot data, and in fact applying inferential statistics to small pilot data inflates false negative rates, biasing science towards large effects. However, the reviewer is right that you shouldn't discuss a non-significant effect as if it was significant. Please clarify.
4. Circular analysis: I disagree. The sentence as it stands already concedes that there could be a statistical bias so my suggestion is to leave this as is.
5. Missing rationale for inclusion of 2 control conditions: Some clarification on why both non-illusion controls are needed would indeed be useful here although I wouldn't consider this essential.
6. Confusing usage of term “NI conditions”: I agree. The fact that one condition is called NI referring to both non-illusion conditions in this manner is ambiguous, and also leads to misunderstandings (see #7 below). This is also a good example of how imprecise language can become a hidden source of flexibility, since referring to "NI conditions" could later be intepreted as either one or both. To avoid this confusion, I suggest renaming the NI condition to explicitly state it has no tactile input (NIV perhaps?). Then throughout the text make clear whether you are referring only to this condition or generally to both. When referring to hypotheses about both, please also clarify how you will intepreted if only one of the conditions show a significant effect.
7 & 8. Inconsistencies in analysis plan/statistics: Please fix these errors (e.g. mismatching power) and clarify the hypotheses. As pointed out above, there is some ambiguity about the hypotheses 2 and the conditions here. In particular, I have now realised that hypothesis 2 refers to NI conditions, and I apologise for not noticing this earlier:
Your ANOVA cannot inform you about these differences for the two individual conditions as these are specific contrasts. That also ties in with the reviewer's point that the ANOVA suggested in your analysis plan for hypotheses 2a and 2b are the same test. To deal with this, I would define hypothesis 2 as the ANOVA ("There is a significant difference in SSVEP between conditions") and then have 2a and 2b as specific t-contrasts. These should be corrected for multiple comparisons. Alternatively (and perhaps better?), you could use a 2x2 ANOVA where one factor is illusion vs control and the other is tactile vs visual-only. The ANOVA could be for the main effect of illusion and then the posthoc comparisons are for MS and UV, respectively, against the two control conditions.
9. Incorrect Cohen's f: Please clarify.
10. Selective (circular) analysis: I assume these are fossils from the previous version that are no longer relevant. If so, these should of course be removed - otherwise please clarify.
11. Confusing teminology of illusion measure: Please clarify this, especially in light of what variables will actually be used in analysis (another point where potential flexibility could creep up).
12. Number of posthoc tests in Hypothesis 1: I suspect this is also a fossil from the previous version. Shouldn't there only be 2 tests?
13. Sufficient power for SSVEP: It took me a while to wrap my head around this. But the reviewer is right that your pilot power spectrum only shows data collapsed across conditions. My reading of the pilot experiment is that it is used only to detemine that the proposed stimulation frequency is feasible. You are not using this to inform your effect size of interest. Although you could consider checking whether there are differences in SSVEP power between conditions in the ballpark you are expecting. I appreciate, however, that this may not be a very reliable estimate so I'd consider this optional.
I will not comment on all the minor points. However, if you are uncertain about any of these feel free to contact me about these, too. Please don't write a detailed response letter to each of my points here - responding to the reviewer's comments is enough. As explained, I hope we can facilitate a swift turn-around of the manuscript, hopefully culminating in IPA. I am sure you are eager to finally get this experiment off the ground. In this context, also be aware of the Christmas closure of PCI:RR so let me know if you think you cannot submit your revision by 1st Dec.
Ngā mihi / Best regards
Sam Schwarzkopf
DOI or URL of the report: https://osf.io/cuq4e
Version of the report: 3
Dear authors
Your Stage 1 manuscript has now been reviewed by two of the original reviewers. While they both appreciate your hard work addressing some of their concerns, a number of fundamental issues remain. There is consensus between reviewers that the proposed experiment is very complex and that the introduction is too long and could be more focussed. I concur with both these points.
One reviewer previously suggested that the experiment could be split into two projects. I would consider that option, but at the very least the rationale should be clarified and number of hypotheses could be reduced for clarity. As discussed in previous rounds, many of the hypotheses are actually main effects irrespective of group. In fact, the only group comparison is the final hypothesis and seems to be about the baseline condition - this leaves unclear why having a control group is actually necessary.
Reviewing Stage 1 RRs is different than with traditional manuscripts. It allows for an iterative process to reach a consensus on the best study before data collection commences. This can (and usually does) make the review process more collaborative and productive than for traditional papers. At the same time, we must be mindful not to overburden volunteer reviewers who are busy doing their own research and have other life commitments. We rely on reviewers' goodwill to dedicate their time to reviewing a great number of manuscripts. For this reason, if no substantive progress can be made towards a final manuscript, there comes a point at which we cannot ask for further revisions.
We are certainly not yet at this point, but the issues raised in the reviews are substantial. You will note that one review is considerably longer and more detailed than the other. In part, this comes down to personal style, but since a Stage 1 manuscript should ideally set in stone the rationale and methods, so it is important to address smaller inconsistencies and also minimise any minor errors before it is locked in (but of course if necessary those could still be fixed at Stage 2). I also appreciate that full consensus may not always be possible - so a manuscript may be recommended (accepted) even if a reviewer is not fully satisfied. As recommender I will make that call. But in the interest of avoiding reviewer fatigue, this will be the last round I will consider for major revisions.
Please don't hesitate to contact me directly with questions prior to submission.
Best wishes
Sam Schwarzkopf
It is clear that Hansford and colleagues have put a lot of work in addressing reviewers comments and I am generally content with the manuscript as it stands. I especially appreciate the addition of pilot data. However, there is one prevailing concern and that is the authors have still not really justified why there are so few between group comparisons. Not only that, but they actually expect the same pattern of results in illusory experience and SSSEPs between patients and controls. Therefore, it is not convincing why this approach should be presented in individuals with chronic pain (especially the neuroimaging). Clearer justification for this, and their specific hypotheses, and also why only one group comparison is being made, is needed.
I also have a few minor comments:
1. The introduction remains too long and should be condensed
2. I like the addition of figure 2 but the current version is a little weird and needs a lot more to make it understandable. Does each vertical panel represent time? If so, some measure of time (and time passed) is needed. Direction of stretching should be included in all conditions that include a virtual finger stretch. What happens during the habituation stage (i.e. why are there two panels here when nothing changes from panel 3-4?). It should be made clear that these images represent the virtual feedback.
3. Although the authors have clearly justified their use for a paired sample t-test an pre-registered, exploratory ANOVA would add value here as interaction effects might be worth exploring
4. The justification for the minimum effect size of interest for hypothesis 2 (d = .50) outlined in the reviewer response is odd. Why is this the minimum effect size of interest for patient studies over and above what is reported in Lakens? I will not press the point, as I think the study will have enough participants to at least develop a clear idea of any differences as a result of the illusion, but the justification for this chosen effect size should be clear in the manuscript (and more logically explained than it was to me in the previous round of reviews).
DOI or URL of the report: https://osf.io/xmz3d
Version of the report: 2
Dear authors
First, let me apologise for the delay in getting back to you. Things slow down significantly around the end of year and January which is summer in the southern hemisphere.
We have now received reviews of your Stage 1 RR from three experts in the field, plus I have also included some comments of my own that did not warrant another triage (especially during the December closure period). As you can see, all reviewers raise some concerns with the clarity and lacking detail in the experimental methods. There is also consensus that more information is required about the pilot study and about the rationale and theoretical background. There may be disagreement about some points between reviewers. Obviously, those may require judgement calls on your part; the manuscript should communicate the proposed research clearly and effectively. You can justify your choices in the response letter.
Please include a version with tracked changes as this will expedite the next round of review.
Specific comments by Recommender:
Best regards
Sam Schwarzkopf
This Stage One Registered Report outlines a proposed study which will use multisenory stimulation to create the illusion of a resized hand in both healthy participants and participants who are suffering from chromic pain. The researchers propose to investigate the effect of a series of multi and unisensory conditions on pain reports and sense of illusion as measured by self-report. They further propose to use EEG combined with vibrotactile stimulation to examin the effect of each condition on frequency locked steady-state evoked potentials centred between electrodes F1 and FC1.
The authors provide a detailed review of the existing literature and justification of the rationale and potential benefits of this study as well as a detailed account of the recruitments, measures and data anlysis that will be used in the development of this project.
As such I am in favour of the acceptance of this project to PCI Registered Reports. However I do have a number of questions and suggestions to the authors that I would like them to address prior to acceptance. I list these points below:
Page 7 paragraph 1 - In the procedure section can the authors clarify whether the vibrations delivered to the finger by the solemoid will be present in all conditions. I assume from the design that this is the case but it would useful for that to be made clear.
Assuming it is the case I do wonder if this presents any issues for the design as a whole given the authors note the analgesic affects of tactile input and the fact that some tactile input is constant throughout the study. I wonder if this might be addressed by an additional condition in which there is multimodal stiumulation but not vibrotactile stimulation, while you would lose the neural data this could at least be used for the chronic pain group to see whether the additional of vibrortactile stimulation modulated the resizing illusions affect on pain response.
DOI or URL of the report: https://osf.io/g7r8q
Version of the report: 1
Dear authors
Thank you for your Stage 1 submission. We regularly triage submissions to ensure the manuscripts are ready for peer review. You have done a commendable job preparing your experimental protocol in accordance with the guidelines - as well as the spirit - of Registered Reports. However, some parts are still unclear or potentially a source of flexibility that could be tightened further. Some of the detailed description may also inadvertently reduce understanding.
1. Main hypothesis (Hypothesis 2):
"[...]there will be a significant main effect of condition between pain patients and healthy participants, measured via SSSEPs, when comparing (2a) multisensory visuo-tactile illusory resizing to a non-illusion condition, (2b) unimodal visual illusory resizing to a non-illusion condition, and when comparing (2c) multisensory visuotactile illusory resizing to unimodal visual illusory resizing."
I found it difficult to ascertain what exactly this hypothesis actually is, both in terms of the general question and the specific statistical test used. You describe this as a "main effect" but the statement reads like an interaction: you want to compare the "effect of condition" (i.e. chronic pain vs control) "when comparing" the experimental conditions. Put more simply, this asks whether the illusion strengths vary between groups. Judging by the description later on this is not what you are actually testing?
2. Clearly specify all your hypotheses
In general, it is advisable, especially in RRs, to reduce any complex interaction effects down to the critical 1-df contrast that can actually address the question. This enhances the sensitivity of your analysis plan and can help minimise the required sample size. The way I read your description throughout the manuscript, I believe you are indeed looking at main effects, which are in essence a 1-df t-test between experimental conditions irrespective of pain group. In that case, however, it is unclear why you even need a control group.
This issue is most notable when looking at the design table in the Appendix. The Analysis Plan column for Hypotheses 2a, 2b, and 2c are in fact all identical. You state here that given significant main effects (which are the same for all three hypotheses!) you will run Tukey's posthoc tests. These Tukey posthoc tests are in fact the statistical contrasts your experiment should be powered to detect.
A similar point applies to Hypothesis 3 where you describe the hypothesis as "[...]reduction of pain [...] before and after each illusion, when comparing multisensory, unimodal visual, and non-illusion conditions". This hypothesis and the associated statistical tests do not actually compare the three experimental conditions. This also makes the interpretation ambiguous because it is unclear what constitutes support for Hypothesis 3.
3. Remove redundancy
While I commend you for specifying different comparisons and their potential outcomes, there is also a lot of redundancy in your text. I already mentioned the most striking redundancy in the design table in #2. But there are also numerous duplicates of statements in the text, especially in section 2.3.2. For most of the sub-hypotheses you describe it seems unnecessary to explicitly state how you interpret the outcomes because you are stating the obvious: if hypothesis 2a is supported, you found a significant difference. Note that such explicit statements can indeed be helpful, for example when you describe the positive control. If you fail to find a significant difference between the illusion and non-illusion conditions, this calls the results from Hypotheses 2 and 3 into question. However, for most of these comparisons I would argue that this repetition actually impedes the reader's understanding of what you're doing.
4. Interpretation of potential outcomes in Hypothesis 3
There is also some inconsistency in how these interpretations are described for Hypothesis 3. You state that if Hypothesis 3 is unsupported you write "Whereas if there are no changes in somatosensory cortex but pain reduction is seen. this shows that the driver of illusory induced analgesia is not coming from changes within the somatosensory cortex." This sentence belongs in the previous paragraph because it means that Hypothesis 3 was in fact supported? (Minor side note: please also note the typo before "this shows" where the period should be a comma).
5. Power analysis
The power analysis is ostensibly based on previously reported minimal effect sizes, but many of these are simply the effect sizes you found (and in an as-yet unpublished study). This is not necessarily a problem but you need to justify why this is a useful mininum effect size of interest. Would finding a smaller effect than this mean that we should accept the null hypothesis?
Moreover, I am confused how you chose the effect sizes used to power Hypothesis 2. In the first paragraph of section 2.4.2 you list the effect sizes from your previous work. The smallest of these is Cohen's f=0.27. In the folloing sentence you then explain that you chose larger effect sizes (f=0.42, 0.63, and 0.4, respectively) for your power analysis because they "adhere to the lower end of the effect size range." This seems to be a contradiction.
There is also again a lot of redundancy in this section. Given that all those hypotheses described are using the same statistical test, you only need to run a power analysis based on the smallest effect size of interest (which should probably be lower than what you found in previous work). So instead of restating the power analysis for each effect size you only need to do this once.
On that note, it is also not strictly necessary to state the actual power achieved, if your analysis is based on 90% power. Either say you calculated the sample size needed for 90% power, or state the power you have at that sample size.
Finally, the sample size you settled on is "85 participants (42 per group)". That sounds mathematically awkward.
6. Time line
The timeline is useful (although you may want to remove this at Stage 2, so perhaps this is better kept in the Appendix). However, this seems to contain an error: you expect that you will need 3.5 months for recruitment. This corresponds to the longest period in your Gantt chart which seems to be data collection. The recruitment period is only a bit over a month, presumably the 7 weeks you mentioned needing previously to recruit 50 participants. I understand that recruitment and data collection go hand in hand but as it stands this doesn't seem to add up.
Moreover, you aim to recruit 90 participants but your power analysis is based on n=85. Aiming for more is certainly wise, but please clarify what happens if you end up with a sample of 85 or more but less than 90. The sampling plan in the classical frequentist framework should specify the exact sample size you plan to collect. Some flexibility in this can be mitigated statistically (e.g. using Bayes Factors instead of frequentist tests) but if so this needs to be part of the experimental plan.
Kind regards,
Sam Schwarzkopf