Recommendation

Understanding the role of health condition, aetiological labels, and attributional judgements in public stigma toward problematic substance use

based on reviews by Nicholas Sinclair-House and Roger Giner-Sorolla
A recommendation of:
toto

To help or hinder: Do the labels and models used to describe problematic substance use influence public stigma?

Abstract
Submitted: 28 October 2021, Recommended: 25 January 2022

Recommendation

People suffering from substance misuse problems are often stigmatised. Such public stigma may impair such people obtaining help and the quality of help that they receive. For this reason, previous research has investigated the factors that may reduce stigma. Evidence has been found, but not consistently, for the claim that labelling the condition as "chronically relapsing brain disease" vs a "problem" reduces stigma; as does "a health concern" vs " drug use". Another potentially relevant difference that may explain different previous results is describing how effective treatment can be.

In this Stage 1 Registered Report, Pennington et al. (2022) describe how they will investigate if any of these factors affect two different measures of stigma used in previous work, with a study well powered for testing whether the 99% CI lies outside or inside an equivalence region. While the CI being outside the region will straightforwardly justify concluding an effect of interest, a CI within the region will need to be interpreted with due regard to the fact that some effects within the region may be interesting.

The Stage 1 manuscript was evaluated over two rounds of review (including one round of in-depth specialist review). Based on comprehensive responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).

URL to the preregistered Stage 1 protocol: https://osf.io/4vscg

Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.

List of eligible PCI RR-friendly journals:

References

Pennington, C. R., Monk, R. L., Heim, D., Rose, A. K., Gough, T., Clarke, R., Knibb, G.,  & Jones, A. (2022). To help or hinder: Do the labels and models used to describe problematic substance use influence public stigma? Stage 1 preregistration, in principle acceptance of version 2 by Peer Community in Registered Reports. https://osf.io/4vscg

Cite this recommendation as:
Zoltan Dienes (2022) Understanding the role of health condition, aetiological labels, and attributional judgements in public stigma toward problematic substance use. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=144

Evaluation round #2

DOI or URL of the report: https://osf.io/dk694/

Version of the report: v2

Author's Reply, 21 Jan 2022

Decision by , 17 Jan 2022

Dear Charlotte

I now have two reviews which are largely positive about your submission. Sinclair-House asks you to comment on the difference between alcohol and opioids in previous studies and the bearing this may have had on different results; and Giner-Sorolla asks you to comment on the quality of the attributional manipulation if it weren't simply trying to resolve differences between previous studies.

Giner-Sorolla raises questions about Bonferroni (interestingly I made almost the exact same point about Bonferroni here https://psyarxiv.com/pxhd2 pp 19-20). On the other hand, given you give yourself some interpretational flexibility  in choosing which measure is better simply by whether it yields the conclusion of a difference, I think some statistical conservatism is fine.

You still have a two-step procedure which is inferentially incoherent, i.e. test whether the mean is within or outside the equivalence region only after a signfiicant result against the H0 of no effect. Why not simply test whether the x% CI is completely outside or within (or only partially within) the equivalence region? That is, drop the initial significance test again the H0 of zero effect, an H0 you have implied is of no relevance by claiming there is minimally interesting effect size.

For RQ3 in the design table, do not label it as exploratory ( to keep things clean don't describe any exploratory analyses in the Stage 1); for the theory at stake simply state the broadest claim that is at stake given your findings (regardless of whether past research has looked at this not, which is not relevant to whether this study tests that claim). In the row for RQ3 be clear this is testing a diference of differences.

Finally, as we have talked about, justify minimal effects of interest by their relevance to the theory tested (in its scientific context) rather than by researcher resources, which have no inferential relation to what a theory predicts.

best

Zoltan

Reviewed by , 22 Dec 2021

I believe this RR satisfies the relevant criteria for Stage 1 review.

The scientific validity of this research question is clearly demonstrated, particularly given the seemingly conflicting findings of the studies upon which it draws. The logic and rationale of the study are coherently outlined and appear credible. The proposed hypotheses are appropriate and the research falls within established ethical norms in the field.

I note the authors' response to comments at the previous review stage and the changes made in light of these. Where changes have been made, these have been beneficial and made the proposed analysis plan more rigorous. Where the decision has been made not to implement a change in line with the reviewer's comments (e.g. minimal effect of interest), this decision seems justified as a balancing of desired statistical power with the practical implications of required sample size. The proposed methodology appears to be appropriate in context and sufficiently detailed to be reproducible.

Whilst I would not necessarily expect to see it addressed in great detail, it is worth noting that one potentially important difference between the Kelly et al. and Rundle et al. approaches which may link to stigma is the use of substances with differing legal and social statuses (opioids and alcohol). Unlike the majority of addictive drugs, alcohol is easily and widely available (and widely used). Leaving aside the separate question of whether or not that should be the case, it is worth reflecting on the extent to which alcohol being the socially-acceptable face of recreational drug use impacts stigma surrounding its (mis)use. That may prove to be a relevant consideration if you find the suggestion of an effect where Rundle et al. did not.

Reviewed by , 16 Jan 2022

This proposal is a fairly focussed attempt to resolve an apparent discrepancy between two studies asking whether describing drug addiction as a disease improves attitudes and reduces stigma. Differences between the studies are analysed and manipulated directly. The stigma measures are reasonable and the indirect discrimination measure by means of financial reward and punishment is an interesting touch.

I think the comments of the editor have been answered thoroughly and in a well-informed way and I am convinced that we have enough statistical power to answer questions of interest.

My one sticking point, though, concerns the necessity for multiple corrections. As usually applied, Bonferroni corrections address a H0 that is not necessarily of interest, namely that all tests included are null. They are usually justified as a way to guard against one-shot "fluke" findings of significance that come about only because too many chances were taken. However, this pattern should also be evident from an inspection of the space of all findings, and from an honest summary of them. Thus, the conclusion of a study where only one out of six hypothesis tests is confirmed should not be "this one test confirmed our hypothesis, so it is true."

In addition to Bonferroni corrections being mathematically unsuited for correlated effects, they also lead to absurdity -- concluding no evidence overall when each of five tests goes in the same direction and is between p = .02 and p = .05, for example. The Holm method is better suited for error control, for one, but I would prefer if the reader decides whether or not correction is applied, by modifying the threshold of significance rather than the p-value itself. For further reading see Mark Rubin's 2021 paper in Synthese.

I also should observe that the manipulations, in particular of attributional judgment, are fairly tightly focussed on resolving the conflict between the two previous studies. Looked at independently, the attributional judgment manipulation isn't that clean or obvious as a manipulation only of attributional judgment. I understand that it is derived from the wordings used in the two previous studies but I think the limitation of this approach should be acknowledged. 


Evaluation round #1

DOI or URL of the report: https://osf.io/zd754/

Author's Reply, 05 Nov 2021

Download author's reply Download tracked changes file

Please see the attached response to the Editor.

Decision by , 02 Nov 2021

Before sending out for review I just want to check a few statistical matters.

1) Your N is set by a power calculation for testing against an H0 of no effect. But you wish to interpret non-signfiicant results with equivalence tests. That means you have one system of inference for asserting there is an effect, and a different one for asserting there is not an effect of interest. This can lead to contradictions, e.g. a significant test against no effect would have led to a conclusion of no effect of interest if equivalence testing alone had been done. You may wish to use a system that is more consistent. For example, you could just use hypothesis testing against no effect with high power; or you could just use equivalence testing, generalized as an "inference over intervals" such as the rule that if a 90% CI is completely within the equivalence region there is no effect of interest; if completely outside there is an effect of interest. If using inference over intervals, the relevant way to estimate N would be the N needed for the CI to fall within the equivalence region e.g. 90% of the time when there is no effect; and outside of it 90% of the time when there is the predicted effect (see https://psyarxiv.com/yc7s5/).

2) You define d = 0.2 as your minimal effect of interest. Why wouldn't d= 0.11 be of relevance to the theory being true, given you cite such an effect as one obtained in past studies and taken seriously I presume as an effect of interest?  Is there really the same minimal effect of interest for all contrasts and DVs? (THat would imply that e.g. all DVs have the same reliability.)  (See paper just referenced.)

3) You say main effects will be followed by Bonferroni corrected t-tests; I am not sure what these would be given df = 1 for all main effects. Specify the family of tests and what the correction will be, if you are going this way. What I actually recommend is that you pick the one contrast that best tests each theoretical claim, and stick with that; i.e. each row i nthe design table has one test aligned to the substantial theoretical claim at stake. Other tests can be exploratory and reported as such in the Stage 2.

4) There is some flexibility in drawing conclusions given a number of DVs are used. Is it possible at this point to relate different DVs to different  theoretical questions, i.e. to make clear what conclusions you would draw given different patterns of outcome, and how that relates to the main theory?

5)  When you say non-significant effects will be followed by equivalence tests, did you mean the interaction effects as well? Or, put another way, if you are going for inference by intervals, presumably you will use that same inferential method for interactions and main effects?

 

best

Zoltan

User comments

No user comments yet