Review for Stage 1 Registered Report: „Stress regulation via being in nature and social support in adults - a meta-analysis“ by Sparacio et al.
The proposed study aims to provide an encompassing analysis of the state of evidence concerning two interventions for stress regulation: „being in nature“ and „social support“. Generally, I think this is a worthwhile endeavor, with high relevance both to basic research and for application. It fits well in the existing meta-analyses on two other interventions and promises to make an incremental and substantial contribution to the field.
I am no expert in the field of stress research and my review is focused on methodological aspects of the Stage 1 manuscript.
The authors demonstrate a good acquaintance with the current state of the discussion regarding bias correction. The proposal is far away from the „naive“ meta-analyses that still can be seen in the literature. In contrast, the authors propose a vast array of different estimation techniques, robustness checks, and bias assessments that are state of the art (some of them could be even considered „experimental“ as not much experiences with them has been gathered by the field).
I have some comments on the proposal, put forward in a constructive mindset with the hope that they make the proposal even stronger. All of them should be easily doable. I roughly ordered them by their importance (in decreasing order). In particular points 1 and 2 do not have the necessary stringency for a preregistration / RR Stage 1 protocol yet.
1. Estimation techniques:
1. Multilevel modeling: What if k is very small, e.g. in a subgroup analysis - is a multilevel RE model still appropriate? What if only one study with multiple ES is present, and all others have only 1 - that could break the MLM estimation. I wonder whether you need a fallback strategy for that analysis.
2. Sensitivity analyses: I appreciate the aim to do many sensitivity analyses. But I wonder: are they too many? Do you plan to cross all dimensions of possible variation, or do you always fix the others to one (sensible) value while varying one focal dimension? I envision that reporting, summarizing, and interpreting this plethora of checks will be strenuous and maybe messy.
3. An MLM selection model should be possible in principle, but I haven’t seen that yet. So the permutation approach is a viable workaround. The open data will allow re-estimations when new methods are available.
4. „To use a measure of precision that is uncorrelated with the effect size, we used √(2/N) and a 2/N terms instead of standard error and variance for PET and PEESE“ -> Please provide a reference or a rationale why you chose this approach (and deviate from the standard procedure)
5. „Additionally, we also used the 4PSM as a conditional estimator for PET-PEESE“ -> How does that work? Conditional on what? I did not understand this.
6. You have a fallback strategy from 4PSM to 3PSM, depending on the number of p-values in each bin. What happens if there are <4 p-values in the 3PSM bins? (Which can easily happen at high publication bias).
7. How do you do inference in the permuted 4PSM models? I understand that the median estimate is used for interpretation, but how is inference done?
8. „If the results of the 4-parameter selection model disagreed with the more general Bayesian model-averaging approach“: What is the inferential criterion for the RoBMA results? The HDI? A BF? How is „disagree“ defined? What if both show „significant“ positive results, but disagree in magnitude? I think the final „inference algorithm“ should be defined more clearly in the preregistration. Currently, it seems to leave a lot of researcher degrees of freedom.
2. Inference: I think it should be clearer and more stringent, which models are used for interpretation and inference. E.g., you write „To estimate the range of effect sizes that can be expected in similar future studies, we calculated the 95% prediction intervals. For each analysis we conducted, when the included effects (k) were less than 10, we did not interpret the estimates.“ -> this relates to the non-bias-corrected model. PET-PEESE is not used at all for inference (except as a part of the RoBMA approach). Could you give some justification on why you capitalize on 4/3PSM, ignore PET-PEESE, and use RoBMA as a „validator“? (To be clear: I think this is a reasonable approach, but some justification for the reader would be nice. Maybe also mention that RoBMA is a quite new approach that probably has not been fully vetted by independent experts and has not been stress-tested in practice). To summarize: I think inference and interpretation should be based on the same model. Make clear what the status of the non-bias-corrected results are: Are they reported just for completeness? Or will they be interpreted? Why not interpret the bias-corrected estimates which are the basis for inference?
3. Text order. It was confusing to read about the bias assessment *after* reading about the fact that studies will be excluded based on that assessment - maybe shift that section before the analysis section?
4. Exclusion criteria: Do you also exclude studies with inconsistent n? If studies are excluded based on risk of bias etc.: Are they a priori excluded for all analyses, or do you look at this assessment as a moderator (e.g. to show inflated ES in biased studies)?
5. Why these two interventions? I understand that you have to start somewhere, but it would be interesting what guided the choice. Are these interventions the most often applied? Do they provide the strongest evidence so far?
6. Personality traits as moderators. This is mentioned on p. 3, but never picked up again. Why would you expect such a moderation? Is that incorporated in your analysis in any way?
7. Consequences of stress: If I understood correctly, the authors would also include studies that do not include one of the three components of stress (the mediator), but only consequences of stress. Then, I guess, they have to include the entire literature on depression, well-being, and much more, as „affective consequences of stress“ can be really a lot. I am not sure whether under these conditions the scope of the meta-analysis is clearly enough defined. What if studies measure well-being (as a consequence), but otherwise have no relation to „stress“ as the mediating factor, neither by measuring it, nor theoretically? Would that fulfill the inclusion criteria? (I am aware that „stress“ always is included in the search term, but the primary study still could be quite distant).
8. Hypotheses: I am not sure if it is necessary to formulate hypotheses, given that the focus is on estimation. Sure, at the end p-values will be computed and reported; but I think the hypotheses could be dropped without much/any loss.
9. Inclusion criteria: To be clear: Do you only include experimental studies?
10. Subgroup analyses: Why k=10 as cutoff? How did the authors arrive at that number?
11. Existing meta-analyses: Maybe it would be interesting to report a reproduction attempt of the existing meta-analysis (in particular when they have been done by other authors). Did you extract the same effect sizes? Do you arrive at a comparable estimate/ conclusion? Although the new, more encompassing MA supersedes the old MAs, it could be interesting to what extent the old stuff is reproducible.
* p. 3: „We intend to shed light on the mechanisms underpinning stress regulation by employing a workflow incorporating various publication bias-correction techniques“ —> how can the latter shed light in the former?
* It might be helpful to explicitly state that the authors (of course) include all studies from the existing meta-analyses.
* p. 7 „For emotional social support we conducted two additional subgroup analyses: The type of social support (e.g., physical) and the source of social support (e.g., known person or stranger).“ —> is the e.g. exhaustive? Can you already define what the subgroups will be? From a preregistration point of view, this would be desirable. Or write explicitly that the categories are not fixed yet and will be created during the coding phase.
* p. 10 „For the affective consequences of stress, we used the same procedure we used for the affective components of stress.“ —> I am not sure to what procedure this sentence relates to.
* p. 11: Exclude studies where participants were below 18 years of age: Any participant? (If it’s only one?)
* p. 11: „Namely, for being in nature, we excluded studies in which participants engaged in physical activities besides walking (e.g., running or exercising).“ —> What if the control group is „running indoors“ (vs. running outdoors). Shouldn’t that be eligible?
* p. 12: „the number of citations of the paper“ —> according to which data base?
* p. 15: „by varying the assumed severity of bias, modeling moderate, severe, and extreme selection.“ —> how did you model this? I don’t want to look into the code for that information.
Felix Schönbrodt (signed review)