
Scoping review of quality appraisal and risk of bias tools and their relevance for behavioral sciences

Gold in, gold out. Quality appraisal and risk of bias tools to assess non-intervention studies for systematic reviews in the behavioural sciences: A scoping review
Abstract
Recommendation: posted 24 February 2025, validated 24 February 2025
Culina, A. (2025) Scoping review of quality appraisal and risk of bias tools and their relevance for behavioral sciences. Peer Community in Registered Reports, . https://rr.peercommunityin.org/PCIRegisteredReports/articles/rec?id=884
Recommendation
Level of bias control achieved: Level 4. At least some of the data/evidence that will be used to answer the research question already exists AND is accessible in principle to the authors (e.g. residing in a public database or with a colleague) BUT the authors certify that they have not yet accessed any part of that data/evidence.
List of eligible PCI RR-friendly journals:
- Advances in Methods and Practices in Psychological Science *pending editorial consideration of disciplinary fit
- Collabra: Psychology
- Peer Community Journal
- PeerJ
- Royal Society Open Science
1. Batinović, L., Pickering, J. S., van den Akker, O. R., Bishop, D., Elsherif, M., Evans, T. R., Gibbs, M., Kalandadze, T., Staaks, J., & Topor, M., Gold in, gold out. Quality appraisal and risk of bias tools to assess non-intervention studies for systematic reviews in the behavioural sciences: A scoping review. In principle acceptance of Version 3 by Peer Community in Registered Reports. https://osf.io/4gy5b
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #2
DOI or URL of the report: https://osf.io/ba2xk
Version of the report: version 1
Author's Reply, 27 Jan 2025
Dear Antica,
Thank you for your quick response.
We respond point-by-point in the attached document.
We updated the OSF with the latest manuscript version including tracked changes. There is also a clear version available to see there.
Best wishes,
Marta Topor
Decision by Antica Culina
, posted 17 Jan 2025, validated 19 Jan 2025
Dear authors,
Thank you for revising your manuscript following the reviewers' comments - I have only a few minor comments, which I think you can address within less than an hour of work. Once these are implemented, I will recommend the report.
1) Unclear meaning of the word 'must' in the sentence 'Behavioural scientist seeking to...' Does 'must' mean that they have no other option if they do want to conduct a RoB assessment or more like 'should' as in that would be the best to do? If the former, then I think a better wording would be 'would have to', if the latter, then 'should' be a better option.
2) Sentence 'Crucially, biased studies are then...', is the 'outcomes' the best word to use here? Primary studies usually define outcomes, but for reviews or MA, think a better term would be 'results' or 'conclusions'.
3) In the sentence ' It is therefore important that systematic review...' would be better to say 'risk of bias OF each primary study'
4) In the sentence 'Instead, risk of bias and methodological quality...' you state that tools should allow for informed judgment on the scale and severity of the problem. However, to my knowledge, most of these tools do not quantify the problems but just detect that there is a risk of bias. How large the risk is and how it influences results is usually out of the scope of the tools. Maybe your intention here was something else?
5) In the sentence 'The scientific process presented many challenges, and realistically...' I think you can remove 'in their entirety'.
6) in the criteria for the full-text screening, point 3, could this be 'one or more' of the following?
7) in the text, please just state the number of benchmark articles you have used ('with a reference set of xx articles)
Evaluation round #1
DOI or URL of the report: https://osf.io/ba2xk
Version of the report: version 1
Author's Reply, 16 Jan 2025
Decision by Antica Culina
, posted 09 Oct 2024, validated 09 Oct 2024
The intended research would be important as it addresses Risk of Bias assessment in evidence synthesis for behavioral sciences. Without the RoB, the evidence synthesis results (and thus conclusions) have an increased likelihood of being misinformative.
As I have obtained only one review for this submission, I have carefully read it myself and provided several comments as annotations in the text (see the PDF attached).
I find the report well-written, and the methodology thoroughly developed.
I provide comments/suggestions as yellow highlights (with a comment) in the PDF of the protocol. In general, I lack a clearer distinction between risk of bias, methodological quality, and reporting quality (transparency). While biases are nicely defined (and described) later in the text, it should be clearly stated at the beginning of the introduction that there is a distinction between the risk of bias (which usually refers to internal biases of the study), the biases of the overall evidence base, and reviewers biases and that this work is primarily focused on the internal (within) study biases. Further, it should be clear (from the start) what is meant by 'quality'. Even later in the text, I did not find a clear definition of what quality is, and sometimes, as written in the text, it seems to be equal to the risk of bias (see my marks in the text). I would not agree with this - bias is a systematic deviation from the truth (which has actually never been explicitly mentioned in the text), and quality (while it can be linked to bias) can also link to e.g. sample size (which might affect CIs of the estimates), use of the best possible methods to measure the outcome (while the other methods are not necessarily biased, but maybe less precise). Finally, there is the reporting quality (or transparency) of primary studies. At least in ecology, CATs will cover some of these (or all) aspects, so I feel it would be important to highlight this early on in the introduction. I think it would also help to specify 'methodological quality' whenever the word 'quality' is used in relation to methodology (otherwise it can be e.g. quality of reporting).
On page 8, it is stated that particular attention will be given to items that assess transparency. I am unsure if this is something that relates to the RoB (other than enabling assessment of the RoB). I would like to see a better justification on including these in the RoB or methodological quality assessment (which is the main aim of this mapping effort). On the other hand, if these are interesting from other perspectives (e.g. understanding the transparency of a certain evidence base), then I am unsure what recommendations will be given for those (once your research is finished). Would you recommend that they are included in CATs? I feel that RoBs should be as simple as possible, as then they are more likely to be used. So if we do add any extra elements to asses, this will likely reflect on the usage of the tool.
In the section describing the literature bias, I see that it is concentrated mostly on the publication bias. However (at least for my field of study, ecology), here we need to consider other biases in the evidence base (e.g. geographic, taxonomic) which obviously will also be linked with the inferences made from the meta-analysis. I do not know if this applies to behavioral studies too.
The protocol is detailed, especially when combined with the additional information available on the OSF. I am however unsure if some of this info would be useful to have in the Registered Report itself.
Download recommender's annotationsReviewed by Alejandro Sandoval-Lentisco
, 20 Aug 2024
Thanks for the kind invitation to review “Gold in, gold out. Quality appraisal and risk of bias tools to assess non-intervention studies for systematic reviews in the behavioural sciences: A scoping review”.
I think this study would address a very important issue. In systematic reviews of RCTs, assessing primary studies with tools such as Cochrane's Risk of Bias is very common. However, outside this context, including such assessments is much less frequent. This may be due to a lack of tools that may be applicable or a lack of knowledge of the authors conducting the reviews. Therefore, I believe that conducting a review to highlight what tools currently exist, what aspects these tools assess, and for what contexts they could be used would be very valuable for authors. I also believe that it will serve to identify gaps that are not currently assessed by existing tools but are aspects that should be examined.
Overall, I think the protocol for the scoping review is very well designed. I commend the authors for their efforts to be transparent and to increase the reproducibility of their work. I also liked very much the comprehensive checklist of items to be evaluated for each tool. There are some points that I think the authors could reconsider, although I do not think it is mandatory to make these changes and I would also be happy if the authors argue why it would be better not to make them.
Regarding the introduction, I believe that the outline on the three types of biases is very illustrative. It explains the potential sources of bias that a meta-analysis may have and mentions the tools or guidelines that can be followed to address them. However, I notice that the 'Study bias' section is as extensive as 'Literature bias' and 'Researcher bias.' I was wondering if it might be appropriate to expand the 'Study bias' section by introducing some key concepts when evaluating the quality of primary studies. For example, I think it could be helpful to already introduce the concepts of construct, external, internal, and statistical validity, as well as provide some examples. It could also be mentioned that some tools cover certain aspects but not all—for example, the Cochrane Risk of Bias scale focuses primarily on internal validity (Hartling et al., 2009).
Other considerations regarding the 'study bias' section. When mentioning concepts like 'selection bias,' 'interviewer bias,' or 'citation bias,' it might be helpful to add some references to studies that discuss these biases in more detail. However, I don't quite understand why citation bias is placed in the analysis stage. I think it might correspond more with the discussion stage (where only studies that confirm certain results are cited). Also, I also don't fully understand placing "analytical flexibility" there. I think it could be more appropriate to describe it as "inadequate use of statistical tests", which might occur when there is analytical flexibility.
Regarding data extraction, the authors mentioned that have piloted this process. If I understand correctly from the shared files, this coding has been done from one tool (ROBINS-I). The procedure that is indicated to be followed is “items from each tool will be extracted by one reviewer and items marked with an asterisk will be independently validated by a second reviewer, who will focus solely on these tagged items”. I wonder whether it might not be more appropriate to assume double coding of all items for at least a percentage of the tools (e.g. 25%, although this might depend on the total number of tools encountered) to ensure that all items are being well understood (the pilot has only been done with one tool). Of course, I understand that this would be more costly, and that it may be a waste of time to perform a double coding if the inter-rater reliability is very high, but perhaps it could avoid errors. I leave it to the authors to decide.
Regarding the data to be extracted from each tool, I think this should be explained more in detail in the main text. In the current version of the manuscript, I found it difficult to see that this information could be found in the ‘Data Extraction Instructions’ file. I think the main manuscript should at least mention the domains for which information is going to be extracted (even if ‘ad-hoc’ items are added later). For example, you could explain that you will extract metadata such as title, format or whether they offer support, as well as content information related to aspects such as ‘Open & Reproducible Scholarship’ or ‘Validity’. I would also reference explicitly that all the items can be found in the ‘Data Extraction Instructions’ document (https://osf.io/ewm7x).
Regarding the data extraction items, I have some concerns. First, about the section ‘Open & Reproducible Scholarship Content’, I wonder whether items such as ‘Open access publication’, ‘Open source software used’ or ‘preprinting archiving’ are really necessary. While these are of course desirable properties of a study, I don't see how these aspects relate to the quality of a primary study. I would be surprised if any quality assessment tool for primary studies assessed this aspect. Therefore, I think not assessing them could save time. Besides, I don't understand how ‘Sample size estimator’ relates to ‘Open & Reproducible Scholarship Content’. Shouldn't this be an aspect of internal validity (underpowered studies have higher probability of Type I error and, because of the lower precision also Type II error)?
Similarly, regarding the section ‘Integrity Assessment’, I am not sure whether ‘Methodology assessment’ or ‘Analytical approach assessment’ are in the right place. Wouldn't ‘internal validity’ be more appropriate as well?
Also, although the current checklist of items is very comprehensive, it is likely that the tools you find will assess aspects that you have not yet considered. I see you mentioned that you will add them as ‘ad hoc’ items. Do I understand correctly that if you find this item in the middle of the data extraction process, it would be evaluated retrospectively for all other tools?
Lastly, another aspect that might be interesting to know is whether there are studies that assess the validity and/or inter-rater reliability of the tools. However, I understand that looking for such studies may take considerable work and may be outside the scope of this review, but I think it is very important to know that these tools are adequate. Related to this, it might also be interesting to know if any training is specified as necessary to apply these tools.