Close printable page
Recommendation

Testing the metacognitive basis and benefits of mindfulness training

ORCID_LOGO based on reviews by Chris Noone and Julieta Galante
A recommendation of:

Minimal mindfulness of the world as an active control for a full mindfulness of mental states intervention: A Registered Report and Pilot study

Abstract

EN
AR
ES
FR
HI
JA
PT
RU
ZH-CN
Submission: posted 21 June 2021
Recommendation: posted 10 February 2022, validated 14 February 2022
Cite this recommendation as:
McIntosh, R. (2022) Testing the metacognitive basis and benefits of mindfulness training. Peer Community in Registered Reports, . https://rr.peercommunityin.org/PCIRegisteredReports/articles/rec?id=45

Recommendation

Mindfulness is inherently metacognitive in that it requires monitoring of one’s own thoughts and attention in order to remain on task. Mindfulness practice is especially metacognitive when focused on internal mental states, rather than on the external world. This Registered Report will compare remote training in mindfulness of mental states with remote training in mindfulness of the world, and a wait list control, to test the idea that mindfulness of mental states has an additional metacognitive component, and that this has benefits for (self-reported) mental health. A comparison of participant expectancies between mindfulness conditions will be used to establish whether mindfulness of the world can be considered a true ‘active control’ condition. Additional comparisons will test whether this mindfulness control has benefits over the (inactive) waiting list condition. The study plan is informed by prior research, including pilot data presented in the Stage 1 report, and a sample of up to 300 participants will be tested.

The Stage 1 plan has been evaluated through two round of signed external review, and a further two rounds of minor revisions, with the recommender obtaining specialist advice on key points from a relevant external expert. The recommender has judged that the manuscript now meets all Stage 1 criteria, and has awarded In Principle Acceptance (IPA).

URL to the preregistered Stage 1 protocol: https://osf.io/tx54k

Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.

List of eligible PCI RR-friendly journals:

References

Lovell, M., & Dienes, Z. (2022). Minimal mindfulness of the world as an active control for a full mindfulness of mental states intervention: A Registered Report and Pilot study, in principle acceptance of version 4 by Peer Community in Registered Reports. https://osf.io/tx54k

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Evaluation round #4

DOI or URL of the report: https://psyarxiv.com/3umz7

Version of the report: v4

Author's Reply, 10 Feb 2022

Download tracked changes file

Dear Prof. McIntosh,

Great to get Prof. Fleming's thoughts on the methods - I was unaware of the differences this would make. I've made those changes to the document. Note that the previous studies did use individual meta-d' as this was before the H-Metad technique was published, and so no changes to H1 or sample size esimtations are required. I believe the only difference between our method of fitting meta-d' and theirs then, will be their use of MLE and our use of Bayesian methods.

Many Thanks,

 

Max

Decision by ORCID_LOGO, posted 09 Feb 2022

Dear Mr. Lovell,

Thanks for your response and revisions on the issue of the measurement of metacognitive efficiency (and its implications for your power analysis).

The regression approach that you are proposing seems logically reasonable to me; however, because it represent (as far as I can tell) a novel approach to the analysis of metacognitive efficiency, I thought it wise to get a second opinion from a recognised expert in metacognition.

Prof. Steven Fleming has provided the following helpful comment to me by email:

"I had a brief look at this section of the methods in the manuscript. I don’t in principle see any problem with the regression approach, and it’s interesting that this achieves greater sensitivity. 

There is only one caveat to this. In single-subject routines for fitting meta-d’ (obtained via maximum likelihood estimation or Bayesian techniques, as implemented in our HMeta-d toolbox), the ratio is calculated post-hoc, by dividing through by the point estimate of d’. Such values of meta-d’ would therefore be suitable for entry into a regression with the d’ point estimate as a covariate. In contrast, the hierarchical version of the HMeta-d model directly seeks to estimate the ratio at the group level. The meta-d’ values for single subjects then become non-independent, as they are affected by inference on this group-level ratio parameter. These would then not be suitable for entry into a post-hoc regression.

From the methods, it was not clear whether they were using the HMeta-d toolbox in single-subject or hierarchical mode (it says “hierarchical Bayesian method” which is ambiguous). If the former, this would be fine – but it would be worth establishing this is indeed the case."

I am returning the manuscript to you one more time so that you can consider and address this important point. Note that the fact that it was ambiguous whether you would be using single-subject or heirarchical mode implies that your wording of methods needed to be more precise, and you should bear this in mind and include any further necessary clarifications to avoid ambiguity in your methods.

Best wishes,

Rob

 

 

Evaluation round #3

DOI or URL of the report: https://psyarxiv.com/3umz7

Version of the report: v4

Author's Reply, 07 Feb 2022

Download tracked changes file

Dear Prof. McIntosh,

Thank you for your replies.

We checked the sensitivity to pick up expected effects with a set of possible measures, meta-d', meta-d'/d', log(meta-d’/d’), and also meta-d’ as DV with d’ as a covariate. The reason for our concern is that when a ratio is used the percentage error in measurement for the ratio depends on the percentage error in the denominator, so ratios can be noisier (than e.g. differences or unadjusted DVs, as in differences it is merely absolute errors that determine the absolute error of the difference). Conversely, adjusting for d' (given that meta-d' varies with d') may take noise out. So it is not obvious if the ratio will be noisy. However, when we checked the estimated N needed for meta-d' (now using both the Schmidt and Carpenter data) it is 110, and for the ratio it is 300, whilst for log(meta-d'/d') it is 330. This is substantial, so a certain amount of foot shooting would be going on by using the ratio. Analysing meta-d’, using d’ as a covariate gives an estimated N of 160. The latter approach controls for d’ AND yields a lower expected N than the ratio (or its log). This is not surprising when considering the general preference in the statistical literature for using regression to control for variables rather than differences or ratios. Note conceptually the adjusted meta-d’ is efficiency, because it is metacognitive capacity controlling for Type I performance, so it answers your query. So as not to confuse the different measures, however, we do not refer to it as efficiency, but as adjusted meta-d’. We now discuss this issue in the paper, which should be of interest to all metacognition researchers.

The paper attached has the relevant changes tracked - a clean version is uploaded to psyarxiv as well.

Please let us know how you wish to proceed.

Kind Regards,

 

Max

Decision by ORCID_LOGO, posted 30 Jan 2022

Dear Mr. Lovell,

Thank you for submitting the further revisions to this Stage 1 RR. The tracked-changes document was helpful in evaluating these.

I do not think it necessary to request additional revisions from Chris Noone, because you have considered his final comments, and provided a rationale for the course of action you have chosen. This rationale seems reasonable to me; but it is publicly available, and so the informed reader will ultimately be able to draw their own conclusions about your primary outcome measure.

However, there is a still a non-trivial problem with your measurement of meta-cognition. First, your Introductory explanations are unclear about the distinction between metacognitive sensitivity (meta-d') and metacognitive efficiency (meta-d'/d'). In my understanding, the former estimates the information used by metacognition (and thus may co-vary with the quality of perceptual information available, d'), whilst the latter is a measure of the quality of metacognitive processing itself (i.e. what proportion of the information potentially available to metacognition has actually been exploited). You wish to measure whether metacognition itself has been improved by your key mindfulness intervention, so it is metacognitive efficiency that you are primarily interested in.

Your arguments for why metacognitive efficiency should not be volatile in your design seem perfectly sound to me. Therefore, I don't understand why you do not just retain metacognitive efficiency (meta-d'/d') as your dependent variable. You suggest that your perceptual staircasing method should keep d' stable between participants, so that any variation in meta-d' can be interpreted as a change in metacognitive efficiency. I see the logic, but I also see three problems with this:

1) You are using meta-d' (a measure of metacognitive sensitivity) as a measure of metacognitive efficiency, because you are that these two things are equivalent given the assumption of constant d'. This convoluted logic is much less clear for the reader than if you simply used meta-d'/d', which is actually a measure of metacognitive efficiency.

2) Exact equivalence would anyway hold only if your staircasing procedure does exactly as you assume, but you would then need to include tests to confirm that this assumption was met. It would be simpler and safer to use meta-d'/d', which automatically compensates for any variations in perceptual sensitivity.

3) Because you now propose to use meta-d', you have changed your targeted effect size to be based on meta-d' from a prior study, rather than on meta-d'/d'. As above, this would be valid only if the staircasing assumption were also true for this study, whereas no such assumption is required for meta-d'/d'.

In summary, I think you should keep meta-'d/d' as your outcome measure for metacognitive efficiency, and that you should revisit all parts of the manuscript discussing these concepts and make sure that you are clear and accurate in the distinctions you draw between metacognitive sensitivity and metacognitive efficiency, and consistent in your application of terminology. This is especially important considering that many people interested in reading this paper may not be familiar with the technical literature on the modelling of metacognition.

I look forward to seeing this issue addressed, either by your changing the plan as suggested, or by a suitable rebuttal of my concerns.

Yours sincerely,

Rob McIntosh

Evaluation round #2

DOI or URL of the report: https://psyarxiv.com/3umz7

Version of the report: v4

Author's Reply, 25 Jan 2022

Download tracked changes file

Dear Rob and Dr. Noone,

Thanks for your replies, I have hopefully matched your requests with a few changes to the study as listed below (as well as a spell check).

Meta-d'

The ratio meta-d’/d will not be volatile due to the denominator being at risk of being close to zero, as d' will be consistently far from zero (i.e. about .75) with our staircasing in place. Nor should Meta-d' itself be volatile due to false alarms and hit rates being close to 0 because of the use if Bayesian hierarchical modelling (with Steve Fleming’s meta-d’ package). Still, our proposed solution is to just use meta-d', as our staircasing should be able to keep d' stable between participants anyway. I have found data for two previous studies I was already using, and extracted the meta-d' means and SE for the group*time interaction for their perception measures (which are similar to that which we are using). R code file has been updated and this and the previous studies' data can be found in the OSF repository. Required sample size for this test is n=110 so total sample needed remains unchanged, and H1 estimates have been updated, as well as the hypothesis registration table.

GAD-7

We agree with Dr Noone that reducing mental health to anxiety is too specific.  I have re-written the section introducing the importance of mental health in this study. Dr. Noone suggested using a sub-clinical measure - however data from those used in our pilot were largely insensitive indicating that a very high N would be needed for this measure. To broaden the measurement of 'mental health' outside of anxiety, I have put the PHQ-8 back into pre-registration. To clarify, the PHQ-4 used in the pilot consists of 2 anxiety and 2 depression questions. The GAD-7 and PHQ-8 have those exact questions in them, respectively, along with others, as they are from the same research project. This captures two of the most commonly used constructs (amongst perhaps just stress, anxiety, and depression) for evaluating mindfulness interventions.

We have also referenced a source explaining why we model H0 as a spike on zero. You will note the required sample size has gone up slightly - I forgot to re-run the estimations when the imputation method was finalised, but these are updated and accurate now.

Please let me know if any further changes are needed.

Many Thanks,

Max

Decision by ORCID_LOGO, posted 18 Jan 2022

Thank you for your attention to reviewer comments and for the revised version of your preprint Stage 1 Registered Report. Reviewer#2 (Julieta Galante) was unavailable for a second round of review, but Reviewer#1 (Chris Noone) has evaluated your revisions with respect to the comments of both reviewers, as have I (to the extent that my expertise allows). We are both generally satisfied that comments have been adequately by revisions, and that you have provided adequate rationale where reviewer suggestions have not been followed. There remain only a few minor issues that need further attention before IPA can be issued for this study.

Reviewer#1 has asked that you consider a different primary outcome measure for mental health, or that you provide a more specific rationale for retaining the GAD-7.

I think that your objective measure of metacognition needs to be more carefully described. On page 15, you introduce this as ‘metacognitive sensitivity’, but it seems that your actual critical measure is metacognitive efficiency, so it would be good to be clear on this point. Also, you propose to calculate metacognitive efficiency as (meta-d’/d’). This ratio measure can be volatile in edge cases in which d’ is close to zero; do you have reason to think that this will not arise, or any plan to deal with outlying cases? (Perhaps this is handled by the hierarchical Bayesian estimation method?)

It would be good if you could provide a reference to support your strategy of adopting a point null hypotheses for Bayesian testing.

Finally, Reviewer#1 has emphasised that the whole manuscript needs close attention to grammar, and the elimination of typos. (I have attached the document on which I noted a few of these typos, in case it is helpful.)

If you return your manuscript with an accompanying letter to indicate how comments have been addressed, the I anticipate being able to evaluate these quickly without the need for further external review.

Best wishes,

Rob

Download recommender's annotations

Reviewed by , 14 Jan 2022

In my opinion, the authors have responded satisfactorily to the suggestions of the editor, my fellow reviewer and I. The revised manuscript is structured in a more accessible manner and provides a strong rationale for testing the effects of the mindfulness of mental states intervention on metacognitive outcomes and the potential of the mindfulness of the world intervention as a placebo. I am in agreement with the authors that this study does not need to be split into one focused on mindfulness and metacognitive outcomes and another on mental health outcomes and they have done a good job of clarifying the role of the mindfulness and metacognitive outcomes as manipulation checks in the methods section and the relationship between metacognition and mental health in the introduction. I'm glad that there is now a performance-based measure of metacognition and I agree with the changes to the measurement of mindfulness. I'm not entirely convinced by the use of the GAD-7 as the primary outcome measure and an operationalisation of mental health in general. I think it can be retained as the primary outcome measure if is considered specifically as an operationalisation of anxiety and a rationale for focusing on anxiety is presented. Otherwise, given the sample, a measure of sub-clinical general distress should be used as the primary outcome. I think that this is the only substantive issue that remains to be resolved before this study receives in-principle acceptance, unless the editor and my fellow reviewer notice something that escaped my attention. Otherwise, the manuscript could also do with some proof reading for typos and grammatical errors (especially moving between tenses when talking about the pilot and then the planned study).

Evaluation round #1

DOI or URL of the report: https://doi.org/10.31234/osf.io/3umz7

Author's Reply, 21 Dec 2021

Dear Prof. McIntosh, Dr. Noone, and Dr. Galante, 

Thank you very much for your detailed replies – you’ve all made several points which I agree were much needed additions and changes to the paper. I will start this reply with the points I would like to disagree with, and which I have tried to explain better in the paper, as those are probably the more important points for me to address. Below this section are the changes I have made to the paper which I hope will address all of your other concerns and suggestions. I have passed this reply through Prof. Dienes and we are in agreement with the points made, although if in turn any of you wish to disagree with this reply we are happy to make the suggested changes. Many thanks for your patience with the reply – some of the additions to the paper took some time to learn about and implement.

1)      Separating the metacognitive and control hypotheses 

I believe the main issue taken with the paper is the use of metacognitive and control condition hypotheses are too much for a first study - the practical concern here is taking metacognition and mental health measures at the same time, which makes for a busy paper. However, note that, as we reference, the notion that mindfulness is essentially metacognitive is a long established position; and so the notion shouldn’t need piloting for plausibility. What is needed to break any possible vicious cycle is outcome neutral tests; and so long as they do their job, nothing is gained by separating the research into two successive experiments, but, as you say, an enormous amount of efficiency is lost (and efficiency is important for someone in their third year of a PhD). As Prof. McIntosh partly pointed out, our aim of 150-300 people would be large for a study of this kind.

Perhaps more could be done to clarify that mindfulness and metacognition measures are our manipulation checks, and mental health is our main outcome. Admittedly, this didn’t quite line up with the contents of the paper, which is most concerned with metacognition, and the suggestion that mental-health is effected because of this is secondary and less crucial. To this end, I have added a paragraph on the relationship between metacognition and mental health in the introduction.

However, where mindfulness is conceptualised as therapeutic the worth of an intervention is often judged by these outcomes. Impact, reader interest, and practical applications are something I am concerned about here. Moreover, our study is fairly sensitive to them as shown by our results on the PHQ-4 anxiety subscale in the pilot. Following from this, in a data-driven approach, we suggested using separate larger anxiety and depression scales from the same family of scales in the main study (within which all the questions of the PHQ-4 are contained). Reviewer 2 suggested to keep the PSS in the follow-up for sensitivity reasons as it indexes sub-clinical mental health issues – in the paper I state a similar argument for why it was used in the pilot, namely it is used in clinical settings. However, PHQ-4 only uses two questions for each of depression and anxiety and so is less reliable than the larger scales. We wanted scales as reliable as reasonable to allow fewer subjects to be recruited and still have a decent probability of achieving evidential Bayes factors. 

My own feelings are that keeping just the GAD-7 as a main/pre-registered measure would be parsimonious, sensitive, and still capture this important construct. Ultimately, I feel okay to simply present these arguments here and leave it up to reviewers if all or some of the GAD-7, PHQ-8, or PSS should be focused on in addition. We have only mentioned exploratory scales in the procedure but they are not pre-registered in hypothesis table.

2)      Measuring Mindfulness and Metacognition 

We also use two unusual measures not featured in the pilot – the TMS-D (now a trait version which I missed previously – I assume this should help with many of your concerns) and a new Observe scale. These scales were selected/created as a result of discussions over pilot data, although their selection is obviously content-based. Our pilot results on the FFMQ were largely inconclusive outside of the Observe scale being higher in the mental states condition – which is strange as, although it does differentiate mindfulness from control in many studies, in the 24-item FFMQ it’s content refers to worldly observation. Looking at the questions we could see there were two ways of reading them, as first and second-order observation of the world. We decided that tweaking this existing scale to separate out these readings would be a strict assessment of our theoretical distinction between conditions. Whilst this measure is unvalidated, it is also exploratory.

Next, we looked at the 4 remaining facets of the FFMQ we were left with as manipulation checks – we reasoned that perhaps they were insensitive/under-powered due to the content being far from a precise measure of how we had defined the mental states intervention theoretically. These facets seem to focus on emotional self-control rather than a general metacognitive decentring. Hoping to not have to invent another scale, I looked through all existing mindfulness scales (and many metacognition ones) and felt that the content of the Decentring subscale of the TMS captured the theories we subscribe to in the paper fairly well. Of note, I have found a trait version of this scale that is well tested and have included this in the paper instead. The other subscale of the TMS was irrelevant to our purposes, and suggestions are that they are analysed as separate. If, as we state, metacognition is central to mindfulness, then the literature in general would benefit from wider use of this measure in the way we suggest. I feel it’s inclusion is a point of methodological strength, and it was chosen precisely because we disagreed (somewhat) with the definition of mindfulness presented by the FFMQ.  

I do, however, fully agree that a more objective measure of metacognition should be used – I have implemented the metacognition task seen here and in several other studies from Steve Fleming’s lab: https://psyarxiv.com/c4pzj/. This task asks participants to judge which of two boxes contains more dots (with staircasing adjustments used to keep accuracy at 75%), and confidence ratings taken after every trial. Confidence-accuracy correspondence is analysed using the meta-d’ method. Methodology (JavaScript/HTML/CSS embedded into a Qualtrics survey) and analysis (MATLAB and R) are set up and ready to go, see supplementary. 

 

Changes made to the paper:

1)      Write up changes

-          Pilot moved to supplementary and short description added – any overstatements toned down. We feel the size of the pilot and use of Bayesian analysis motivate drawing tentative conclusions from the sensitive data obtained, and conclusions should also be withheld when Bayes factors are close to 1, so these don’t limit the utility of the pilot.

-          Final paragraph of the intro integrated into an earlier point. Also added some words on the importance of mental health to our study.

-          Statement about PWB-Env results being in-line with hypotheses changed.

-          Randomisation will made clearer: the entire experimental procedure is automated through Qualtrics and so we rely on their randomisation procedure.

-          Frequentist analysis clarified in the pilot – although as our conclusions rely only on the Bayesian approach, we believe these should not be pre-registered for the current study as this would increase analytic flexibility (they will be the same as the pilot, however).

 

2)      Methodological changes

-          Inclusion of an ‘objective’ measure of metacognition (see above)

-          Participants will be allowed and encouraged to complete the post-test survey if they choose to discontinue the study early.

-          Data has been analysed on a multiple imputation basis. Our dataset had many patterns of missing data but I hope my application of multiple imputation well justified. However, pre-test, condition, and expectation data has not been MI due to the sheer amount of missing data making this difficult when also having to impute a large amount of post-test data. I have addressed this in the paper. These issues will not occur in the main study (note, the pilot experiment was collected and set up by undergrads over several years, hence the issues!).

-          Recalculation of the scaling factor for the models of H1 has been done with more studies, although the scaling factor turned out to remain the same, .2 Likert units. R-code and data can be found in the supplementary folder.

 

3)      Online supplementary

This includes the following:

-          Pilot study write-up.

-          All scales used (pilot and main study)

-          Course transcripts and audio recordings (pilot and main study)

-          Analysis code written in R (pilot only)

-          Pilot data

-          Data and code for calculation of new priors.

-          Metacognition ‘dots’ program code (both to run ‘out of the box’ and integrate with Qualtrics) – a version of this and the survey can be provided to reviewers if needed.

You might have missed the online supplementary before linked to at the Psyarxiv file: https://osf.io/vu2dk/. Unfortunately this has just been marked as spam by OSF, but I have emailed them so should be publicly available very quickly.

Eagerly awaiting your replies!

Many thanks,

 

Max Lovell

Decision by ORCID_LOGO, posted 12 Oct 2021

Dear Mr. Lovell,

Thank you for your PCI-RR submission, and for your patience in awaiting this decision. The Stage 1 report has now been evaluated by two reviewers, whose full comments you will find below.

While both reviewers judge the research questions as valid, and well-motivated from theory, Reviewer#2 argues that the two main aims [(1) to test the role of metacognition in mindfulness practice and (2) to test the effects of mindfulness against a true active control] would be more appropriately approached sequentially rather than in a single study. I can see the logic of this idea, because the conceptual basis of the test of (2) is dependent upon the outcome of (1). For instance, the sense of your question (from the registration table) “Does a metacognitive component of mindfulness account for its positive effects on mental health?” is critically dependent upon the outcome of the preceding question “Is mindfulness a metacognitive practice?”, as well as on the question “Do expectations to improve account for differences in outcomes?”. I would encourage you to consider seriously Reviewer#1’s suggestions for splitting and chaining your research aims (perhaps in a programmatic RR).

This suggestion seems also to be consonant with Reviewer#2’s view (point 1B) that more performance-based measures of metacognition (i.e. not just TMS Decentring) would be required to convincingly test the idea that mindfulness is a metacognitive practice. Perhaps you are trying to do too much in one-shot here, and should focus either on the practical question of whether MMS works better than MW, or on the more basic question of what active ingredients differ between the two active treatments? To avoid trying to do too much at once, certain measures (e.g. TMS Decentring) could be accorded a more exploratory status (i.e. include this measure, but develop its analysis at Stage 2 rather than Stage 1).

However, I can also appreciate that the resource implications of a study this size make it practically desirable to try to address multiple aims in parallel. To the extent that you do wish to include multiple aims, then you should strive for clarity in delineating these, and the conceptual relationships and inter-dependencies between them. At present, the description of your manipulation checks and non-crucial questions is heavily interwoven with the description of your main hypotheses. It would be clearer for the reader if the manipulation checks could be clearly separated out, both in the text and in the registration table. I also suggest that, for simplicity, that any non-crucial hypotheses should be left out of the Stage 1 report and added at Stage 2.

Another major comment made by Reviewer#2 is that too much space is given to the reporting of the pilot study, and that this is not justified by the strength of the pilot study itself and does not benefit the Stage 1 report in terms of centring the main study plan. It would be quite a major job in itself for the reviewers to review the pilot study as a full experimental report, and this does not seem necessary for the advisory role that the pilot data are playing here. I would agree with Reviewer#2 that putting the report of the pilot study into an appendix, or online supplementary material, would be a potential solution.

More complete sharing of materials may also be necessary to enable full replication of the planned study (this would presumably include all written/audio materials for the mindfulness treatments).

Neither reviewer has expertise in Bayesian approaches, so neither commented extensively on the Bayesian analysis plans. I have some degree of familiarity here, although I am by no means an expert. The analyses themselves look well-specified, but the choice of priors could do with a more detailed explanation. The expected effect size is based on prior literature, but seems to be based exclusively on ref 72 (Cavanagh et al), and not informed by the other studies mentioned, not apparently by the pilot study (the models of H1 are the same as those used in the pilot).

The reviewers make a number of other key suggestions and comments that you should respond to. Neither commented much on the Introduction, except to say that the plan was generally well-motivated. Whilst I agree with this, I think that the structure of the Introduction could be improved. At present, the transition from prior literature into the present study is very abrupt. The final paragraph before the pilot (p6-7) introduces new material that does not seem to flow well at this point; if this material is to be included, it should be included earlier. If you follow Reviewer#2’s recommendation of moving the full pilot report to supplementary materials, then a textual sumary of the pilot work may provide a more natural transition into the main experiment.

You should give careful consideration, and a full reply, to all reviewer comments. We look forward to receiving your revision, if you choose to submit one.

Best wishes,

Rob McIntosh

Reviewed by , 26 Aug 2021

1A. The scientific validity of the research question(s). 

For me, the report passes this criterion. The research questions regarding the differences between a waitlist condition and the mindfulness of the world and mindfulness of mental states interventions flow logically from both Higher Order Thinking theory and the existing evidence base on active-control conditions in mindfulness research and the existing evidence base on the effects of mindfulness-based interventions on metacognitive, mindfulness and mental health outcomes. As noted under the next criterion, there may be a case for narrowing the breadth of the research questions so that there is a closer mapping between them and the hypotheses.

1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable. 

I consider the report to have passed this criterion too. The hypotheses are precisely specified and the estimands for each hypothesis are clearly and logically derived from the theory and evidence presented. There is a clear plan for the thresholds that will be considered evidence for or against the hypotheses. The pilot study demonstrates that the hypotheses are testable and provides important indications of how to optimise the design of a confirmatory study. While the hypotheses are certainly relevant to the research questions, the research questions may be too broad to allow the hypotheses to provide a direct answer. Perhaps the research questions could be refined to be more specific. For example, I’m not convinced that should there be a lack of a difference in TMS Decentering scores between the two active conditions (i.e. the third hypothesis) that this would be strong evidence against mindfulness being a metacognitive practice (i.e. the third research question), especially without additional evidence from performance-based measures of metacognition.

1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).  

I may not have enough expertise in Bayesian analysis to properly judge, but given the information provided I am happy to consider this criterion passed. The experimental design is feasible, as demonstrated through the pilot study, and allows for the hypotheses regarding comparisons between the three independent conditions to be tested with a high degree of internal validity through features such as randomisation, clear data inclusion criteria and the monitoring of expectations. The study also has some features that promote external validity such as the similarity of the active conditions to widely available mindfulness-based interventions delivered online or through smartphone applications, but the sampling approach will limit generalisability (though this need not not be an aim for this research programme yet). While I am not an expert in Bayesian approaches to power analysis and hypothesis-testing, the analytic approach described in this report is clearly explained and justified.

1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.

In my opinion, it is currently not possible to replicate the proposed study with the materials provided, so I have reservations regarding this criterion, but these would be resolved if the surveys and analytic code were included too. Having said that, the measures and proposed analyses are described in detail and would allow approximate replication. Access to the surveys and analytic code would ensure that, for example, the same order of measures is used and that the same R packages and functions are used to implement the analyses. Since the code for analysing the pilot data is provided, I presume that this could easily be adapted to the proposed study. The hypothesis registration table is clear as it illustrates the mapping of each research question to its corresponding hypothesis and provides detailed and invariable information about how these hypotheses will be tested and how those results should be interpreted. I am confident that the design and analytic strategy minimise the opportunity for researcher flexibility.

1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s). 

I think the report passes this criterion. The report proposes a positive control for mindfulness-based interventions and presents pilot data that tentatively supports the proposed positive control condition. The proposed study will also include other manipulation checks focused on expectations and the amount of time taken to complete each active condition. Adherence is an aspect of intervention fidelity (often termed “treatment receipt”) that mindfulness-based interventions often assess. It may not be assessable in the current design of the proposed study due to the data inclusion criteria but it is not vital to change the design to allow the effects of adherence to be investigated. The only aspect of the manipulation checks that I would question is the reliance on the TMS Decentering scale as a measure of metacognition. Given concerns that enhanced metacognition may actually lower scores on self-reported dispositional mindfulness questionnaires, I wonder whether it would be better to use a performance-based measure of metacognition, if there is one that can be implemented online given the resources available to the researchers.

Reviewed by , 11 Oct 2021