**Is borderline personality disorder linked to impairment of the tactile mirror system?**

**Chris Chambers**based on reviews by Zoltan Dienes and 2 anonymous reviewers### Cortical plasticity of the tactile mirror system in borderline personality disorder

### Abstract

**EN**

**AR**

**ES**

**FR**

**HI**

**JA**

**PT**

**RU**

**ZH-CN**

*Submission: posted 05 January 2023*

*Recommendation: posted 25 June 2023, validated 25 June 2023*

**Cite this recommendation as:**

Chambers, C. (2023) Is borderline personality disorder linked to impairment of the tactile mirror system?.

*Peer Community in Registered Reports, .*https://rr.peercommunityin.org/articles/rec?id=367

#### Recommendation

**URL to the preregistered Stage 1 protocol:**https://osf.io/sqnwd

**Level of bias control achieved:**Level 6.

*No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.*

**List of eligible PCI RR-friendly journals:**

- Advances in Cognitive Psychology
- Brain and Neuroscience Advances
- Cortex
- Imaging Neuroscience
- In&Vertebrates
- NeuroImage: Reports
- Peer Community Journal
- PeerJ
- Royal Society Open Science

**References**

2. Mier, D., Lis, S., Esslinger, C., Sauer, C., Hagenhoff, M., Ulferts, J., Gallhofer, B. & Kirsch, P. (2013). Neuronal correlates of social cognition in borderline personality disorder. Social Cognitive and Affective Neuroscience, 8, 531-537. https://doi.org/10.1093/scan/nss028

**Conflict of interest:**

The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

*Evaluation round ***#3**

**#3**

DOI or URL of the report: **https://osf.io/nhrzp?view_only=c0d7ffddc6d647949099e9b6e26f6f49**

Version of the report: 2

#### Author's Reply, 10 Jun 2023

#### Decision by **Chris Chambers**, *posted 05 Jun 2023**, validated 05 Jun 2023*

I consulted with Zoltan Dienes again and most issues are now settled. You will see that there are two remaining points to address concerning the selection of the smallest effect size of interest and the statistical testing procedure. Please consider these points carefully. Provided you are able to address these issues in a final revision, we should be able to proceed with Stage 1 in-principle acceptance without requiring further in-depth review.

#### Reviewed by **Zoltan Dienes**, 01 Jun 2023

The authors have clarified they will base their decisions on one system (frequentist hypothesis testing) to be clear how they have tied their inferential hands; which hand-tying they likewise do for assumption testing by being clear about how they will proceed with testing normality. (Using power in this way does not make best use of all data once it is in; but that is the authors' choice.) BFs will be reported for information only. So this deals with a key issue. But there remain a couple of points:

1) The use of power to determine N means they need to justify a roughly smallest effect of interest. The authors say "we based power analysis on the lowest available effect size, whenever possible." But the paper itself lists a single past study for each test and uses the effect size from that past study. Technically, this may then be the smallest available effect because there is just one. But then one has not plausibly controlled Type II errors so as not to miss very interesting effects. My main concern is respecting the spirit of the point; but here is a particular suggestion. Following ideas in the paper I previously referred to, https://doi.org/10.1525/collabra.28202 they could put a 80% CI on the one previous most relevant study, and use the bottom limit of the CI as an estimate of a smallish effect that is just plausible, and so long as it is interesting, that could form the basis of the power analyses.

2) There are arguments for why it is better to use robust tests from the beginning rather than doing the two-step "significance test of assumptions -> choose test" procedure (e.g. Field & Wilcox, 2017 https://doi.org/10.1016/j.brat.2017.05.013). The authors are aware of these issues, but suggest because they are dealing with 2X2X2 effects, Yuen robust t-tests are ruled out. In fact, as far as I could tell, all crucial tests in the Design table involve either a repeated measures main effect or interaction on HC; or the difference in such an interaction between the two groups. While not all terms of the 2X2X2 ANOVA can be easily run as a Yuen t-test, all terms that involve an interaction with group can be; and all purely repeated measure interactions involving one group can be tested with a robust one-sample t-test. As I say, as far as I can make out, that applies to all crucial tests. Since all other tests are exploratory (pre-registered conclusions from them will not be drawn), they should in any case be reported in a separate section. This opens back up the option of being robust from the start; and this consierably simplifies the pre-registration. I leave this to the authors' judgment.

*Evaluation round ***#2**

**#2**

DOI or URL of the report: **https://osf.io/bufja?view_only=c0d7ffddc6d647949099e9b6e26f6f49**

Version of the report: r1

#### Author's Reply, 22 May 2023

#### Decision by **Chris Chambers**, *posted 14 May 2023**, validated 14 May 2023*

The three reviewers who assessed your initial Stage 1 submission returned to evaluate the revised manuscript, and the good news is that the manuscript is getting closer to achieving Stage 1 in-principle acceptance (IPA). As you can see, however, there remain some significant issues to address in the review by Zoltan Dienes to ensure that the hypotheses, sampling plans, and *a priori* evidence thresholds are fully defined and justified. As Zoltan is a member of the PCI RR Managing Board, feel free to contact him directly if you need any additional assistance addressing these points.

I look forward to receiving your revised manuscript, which will be re-evaluated by Zoltan and the Managing Board before issuing a final Stage 1 decision.

#### Reviewed by anonymous reviewer 1, 05 May 2023

The authors have addressed my previous points with the exception of the following: Exclusion criteria (page 6, figure 1): for group level exclusions, text says +/-2SD, figure 1 says 2.5 SD – please clarify which is correct.

#### Reviewed by anonymous reviewer 2, 11 May 2023

Thanks to the authors for the modifications done to the manuscript and taking into account my comments. I have no other comment to make.

#### Reviewed by **Zoltan Dienes**, 26 Apr 2023

They key point of my review - making sure the size of effects in relevant predictions are scientifically justfied - has not been addressed. The authors have added Bayes factors - though not specified nor justified the model of H1 used in the Bayes factors - but then still used frequentist power analysis to justify N. Further, they have used a decision threshold of 1 for the Bayes factor, allowing decisions based on virtually no evidence. Finally, if decisions are to be with respect to the Bayes factor (according to the Design Template), it is not clear what if anything will be done with the frequentist statistics (which are said to drive the interpretation, according to the text) - there is scope for plenty of inferential flexiibility here. One inferential system must be fully specified, including how conclusions will follow from that system.

I am happy for the authors to go over to Bayes factors if they wish (in fact I think it will be easier, but I leave it to them to judge that). But they need

1) To estimate an expected sample size based on the properties of the Bayes factor they use, and not basedon frequentist power analysis, see e.g. for a very quick overview of how to do this see https://www.youtube.com/watch?v=10Lsm_o_GRg

For the above estimation of N, the BFs that will be used to test predictions must be used. And for each of those BFs, the predictions of the theory need to be modeled - the model of H1. That is, my point about power requiring justification of the effect size used is not avoided by using Bayes factors. Let me be clear about why:

To test a theory via one of its predictions, the prediction must be justified as relevant to the theory. Obviously, if it is not relevant, falsifying the prediction does not count against the theory. Example i) if the prediction is for an effect, and the exact size of the effect found in a previous study is used for power, the prediction is modeled effectively as: The minimally interesting effect is the effect in the previous study; that is, missing any effet smaller than this is OK, as such effects are too small to be interesting. But this final conclusion is typcailly false. Typically effects smaller than that found in a preivous study are interesting. That is why PCI RR says in the Guide for Authors: "Power analysis should be based on the lowest available or meaningful estimate of the effect size." ii) If an arbitrary number like d = 0.5 is used for the prediction (arbitrary in many ways, including the dependence of the population d on the number of trials in the study), then the prediction is arbitrarily related to the theory (just as the number of trials used is arbitrarily related to truth of a theory), and falsifying the prediction does not count against the theory. (To address this argument, the argument in itself must be addressed: Claims about what standard practice is do not address the argument.)

2) To specify the models of H1. A Bayes factors pits a model of H1 against a model of H0. For the BF to be relevant tothe theory, the model of H1 should eb relevant to modeling the predictions of the theory. Modeling H1 can be thought of as boiling down to estimating a roughly predicted size of effect (contrast power).

So how to model each prediction? Some suggestions to consider

Hypothesis I: This is straightforward as relevant previous study has been identified using the same proposed questionnaire between groups very similar as will be sued for the current study. As a roughly expected effect size is what is relevant, the previous different in scale Likert units could be used as the SD (scale factor) of a half-normal. See replication heuristic of https://doi.org/10.1525/collabra.28202

Hypothesis II Similarly there is a previous study using the same apradign; use the raw effect size as the SD of a half-normal.

Hypothesis III: Use the room to move heuristic of https://doi.org/10.1177/2515245919876960

Hypothesis IV: Use the room to mvoe heuristic of https://doi.org/10.1177/2515245919876960

More analytic flexibility is introduced by not having scripted normality checks. I actually agree that the standard significance tests of normality are pretty much useless and checking by eye can be better. But that leaves analytic flexibility. In fact, the authors propose to use robust t-tests (trimmed, yuen t tests). In general there is no reason to think non-parametric tests produce more relevant results than robust t-tests, so I wold suggest simply using the robust t-tests (performing the equivalent Bayes factors by using the robust means and SEs that go into the robust t-test, then using a BF calculator such as https://harry-tattan-birch.shinyapps.io/bayes-factor-calculator/). The authors worry about robust ANOVAs. But in fact the crucial test in every case is one degree of freedom and thus the relevant contrast reduces to a t-test. That is practically, find relevant differences or differences of differences for repeated measures variables; then perform a robust two group t-test on the difference of diferences (etc); from the robust t-test extract the robust SE of the effect and the robust estimate of the effect and enter these into the BF calculator.

If the authors have questions on the above they are welcome to contact me, if they wish.

*Evaluation round ***#1**

**#1**

DOI or URL of the report: **https://osf.io/cjs6h?view_only=c0d7ffddc6d647949099e9b6e26f6f49**

Version of the report: 1

#### Author's Reply, 20 Apr 2023

#### Decision by **Chris Chambers**, *posted 20 Mar 2023**, validated 20 Mar 2023*

Three reviewers have now completed an initial evaluation of your manuscript, and I have also read it with interest myself. Overall, the reviews are encouraging about the potential for Stage 1 acceptance, following a thorough revision to strengthen various elements of the study design and presentation. Among the various comments, the reviewers highlight the need for clarifications to the study rationale, procedural details, and analysis plans. Two of the reviewers suggest adopting an alternative (or at least complementary) analysis plan involving Bayes factors, and I would very much encourage you to consider this because the study outcomes will then be more informative, regardless of the results. If you eventually adopt both frequentist and Bayesian inferences, be sure to specify which outcomes (the Bayesian or frequentist) will shape the conclusions. Other comments should be straightforward to address by adding minor details to the manuscript or noting in your response where a particular detail was missed (e.g. I note that point 4 of the 2nd anonymous reviewer -- definiition of S1 -- is already stated on p7).

I look forward to receiving your revision and response, which I will return to the reviewers for re-evaluation.

#### Reviewed by anonymous reviewer 1, 14 Mar 2023

Review of Stage 1 registered report by Zazio et al. The points below are organised according to the stage 1 criteria.

1A. Validity of the research question. The authors postulate differences in associative learning mechanisms in borderline personality disorder. It’s not clear why they hypothesise that these differences should only be present for social associations, i.e. tactile mirroring. A more general deficit in associative learning should have widespread effects on cognitive functioning. Please present evidence for such a deficit or a clearer rationale for why BPD patients should have specific differences in tactile mirroring.

1B. Proposed hypotheses. The authors propose that cm-PAS will improve tactile acuity but ‘decrease performance’ on the visual-tactile spatial congruity task. Please specify more clearly how the decrease in performance will be indexed, i.e. as an increase in response times on incongruent trials, a decrease in response times on congruent trials, or both?

1C. Feasibility of methodology and analysis. The analyses are not sufficiently clear at present. Here are some required improvements:

What is your approach to outlying data points (at the trial and at the participant level)?

Please specify which analyses will be performed on the tactile acuity ‘global performance’ measures (d’ and criterion).

The dependent variable for the VTSC task is specified as the difference between incongruent and congruent trials. However, you are also measuring tactile-only trials. Please explain how the tactile-only trials will be incorporated in this analysis, as your analysis plan only specifies ‘VTSC measures’. Please also specify follow-up analyses if an effect of cm-PAS is found on the difference between incongruent and congruent trials: as noted under 1B above, the difference could be generated by changes to processing for the incongruent trials, the congruent trials, or both; how will you investigate this?

The main analysis comparing the effect of cm-PAS across control and clinical groups compares the two groups at the two timepoints (pre and post cm-PAS) for the various dependent variables, on the cm-PAS 20ms condition only. Given that the cm-PAS 100ms condition is a crucial control condition, please add another factor of cm-PAS condition (20ms, 100ms) to the analysis. Please also specify the dependent variables more clearly, as per the points above.

Please consider including session order as a variable to account for learning / carry-over effects.

1D. Methodological detail.

More detail required please of where patients will be recruited from, and where controls will be recruited from. Will both groups be community samples, for example? Authors mention matching on gender and age – will this be on a case-control basis and if not, how will this be done? What about ethnicity of participants – the hand stimuli display a White hand so will participant ethnicity be matched across groups?

Which subscale(s) of the QCAE were used to measure cognitive empathy in the pilot data (page 4, point i) under sample size)? And which will be used in the main study (also which IRI subscales will be the focus of analysis)?

Sample size estimation for the VTSC measure – I appreciate that the effect of cm-PAS on this measure has not previously been tested, making it difficult to estimate the likely effect size. The authors have therefore based sample size estimate on the visual acuity task. Please provide some indication of the relative variation in performance across the visual acuity and VTSC tasks or some other comparison between these tasks to convince readers that the effect of cm-PAS on the VTSC task is likely to be of the same order of magnitude as its effect on the visual acuity task.

Pharmacological treatments for BPD are likely to mean many potential participants in this group have contra-indications for TMS. Please comment on how representative your eventual sample is of patients with BPD in general.

Figure 1 suggests ISIs of 20 and 150ms, text suggests 20 and 100ms – which is correct?

With 11 participants per cm-PAS session order, task order cannot be fully counterbalanced – consider increasing sample size to 24 per group to allow full counterbalancing of session order and task order?

Will performance on catch trials on VTSC task be included as exclusion criterion? If so please specify performance cut-off for inclusion/exclusion, if not please indicate what the purpose of these catch trials is.

It would be desirable for the experimenter delivering the tactile stimulation in the tactile acuity task to be blinded as to participant group (control, BPD) and certainly as to cm-PAS condition (20ms, 100ms). Please confirm whether this will be the case.

1E. Outcome-neutral conditions. The effect of cm-PAS in healthy controls is included as a positive control.

#### Reviewed by anonymous reviewer 2, 07 Mar 2023

#### Reviewed by **Zoltan Dienes**, 06 Mar 2023

The authors have written clearly on the background to their study, its methods and how they will analyze the results. They have also considered the power for each effect separately. However, their power calculations are based on the mean estimates of previous work, which means they have not thereby controlled their error rates for missing effects somewhat smaller than this but still practically or theoretically interesting. That is my main point for the authors to address.

Hypothesis 1:

Two tests will be conducted. Under what conditions will it be asserted the groups differ, and under what conditions that they are the same? If either f the two tests being significant results in the conclusion of a difference, then a correction of multiple testing is needed. How will sameness be inferred? Power was calculated for an effect of about d = 1. But presumably a known population difference of d = 0.5 would not count as the groups being sufficiently similar that they have practically the same empathy.

My personal approach would be to use a Bayes factor, with H1 modeled as half-normal with a SD (scale factor) of the raw difference in questionnaire units that has been found between the groups before, because this is the predicted effect size. However for power one needs something else; namely the effect size we just don't want to miss out on detecting. Otherwise we have no justification for asserting H0, should there be a non-significant result - because we have not controlled Type II error rate for effects of interest (see https://doi.org/10.1525/collabra.28202 ). One heuristic in that reference that may be useful, is using the lower end of a CI (maybe an 80% CI) on previous tests of BPD-HC differences as the roughly smallest plausible effect that may still be of interest.

Hypothesis 2:

The issue of having enough power to assert there is no interaction arises here as well. Admittedly the authors do not claim they will assert there is no interaction, only that a non-significant result will not confirm that one exists. But that is an inferentially weak position to be in: It would be good to have evidence whether the effect does or does not hold. Again one could either use a Bayes factor, or else use the lower end of a CI to inform the power calculation.

Hypothesis 4:

Rather than predicting simply a medium size effect, which does not have a scientific justification as such, it may be best to think in terms of what difference in (e.g.) d' would be sufficiently meaningful.

Other points:

abstract:"Here, we take advantage of a cross-modal PAS (cm-PAS) protocol"

define PAS.

p 7

" after transforming the data in case of non-normality of the distribution."

How exactly will non-normality be established? Some details are given later, but not a precise decision rule. What transformations will be used? Provide a set of if-then rules to tie down analytic flexibility.

" The sensory threshold will be estimated by fitting a logistic function to d’ values (transformed to fit in a range between 0 and 1;" State how the fitting is done.