The impact of removing facial features on quality measures of structural MRI scans

ORCID_LOGO based on reviews by Catherine Morgan and Cassandra Gould van Praag
A recommendation of:

Defacing biases in manual and automated quality assessments of structural MRI with MRIQC


Submission: posted 28 November 2022
Recommendation: posted 30 May 2023, validated 31 May 2023
Cite this recommendation as:
Schwarzkopf, D. (2023) The impact of removing facial features on quality measures of structural MRI scans. Peer Community in Registered Reports, .


Data sharing is perhaps the most fundamental step for increasing the transparency and reproducibility of scientific research. However, the goals of open science must be tempered by ethical considerations, protecting the privacy and safety of research participants. Bridging this gap causes challenges for many fields, such as human neuroimaging. Brain images, as measured with magnetic resonance imaging (MRI), are unique to the participant and therefore contain identifying information by definition. One way to mitigate the risk to participants arising from public data sharing has been "defacing" the MRI scans, i.e., literally removing the part of the image that contains the face and surrounding tissue, while preserving the brain structure. This procedure however also removes information that is not (or at least minimally) identifiable. It also remains unclear whether defacing the images affects image quality and thus the information necessary for addressing many research questions.
The current study by Provins et al. (2023) seeks to address this question. Leveraging a publicly available "IXI dataset" comprising hundreds of T1-weighted structural MRI scans, they will assess the effect of defacing on manual and automatic estimates of image quality. Specifically, the researchers will compare image quality ratings by experts for a subset of 185 images. They hypothesise that images in which facial features have been removed are typically assigned higher quality ratings. Moreover, using a full data set of 580 images, which have been obtained across three scanning sites, they will also test the impact defacing MRI scans has on automated quality measures obtained with MRIQC software. The results of this study should have important implications for open science policy and for designing the optimal procedures for sharing structural MRI data in an ethical way. For example, if the authors' hypothesis is confirmed, studies relying on MRI quality measures might be better served by a custodianship model where identifiable data is shared under strict conditions, rather than relying on publishing defaced data. More generally, the outcome of this study may have significant legal implications in many jurisdictions.
The Stage 1 manuscript was evaluated at the inital triage stage by the Recommender and PCI:RR team, and another round of in-depth review by two experts. After a detailed response and substantial revisions, the recommender judged the manuscript met the Stage 1 criteria and awarded in-principle acceptance (IPA).
URL to the preregistered Stage 1 protocol: (under temporary private embargo)
Level of bias control achieved: Level 2. At least some data/evidence that will be used to answer the research question has been accessed and partially observed by the authors, but the authors certify that they have not yet observed the key variables within the data that will be used to answer the research question AND they have taken additional steps to maximise bias control and rigour.
List of eligible PCI RR-friendly journals:
1. Provins, C., Savary, E., Alemán-Gómez, Y., Richiardi, J., Poldrack, R. A., Hagmann, P. & Esteban, O. (2023). Defacing biases in manual and automated quality assessments of structural MRI with MRIQC, in principle acceptance of Version 3 by Peer Community in Registered Reports.
Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviewed by , 09 May 2023

I am happy with all the revisions made, look forward to seeing the study results in due course.

Evaluation round #2

DOI or URL of the report:

Version of the report: v2

Author's Reply, 02 Apr 2023

Decision by ORCID_LOGO, posted 08 Mar 2023, validated 09 Mar 2023

Dear authors

Your submission to PCI:RR has now been reviewed by two experts in the field. Both were impressed by your plan and agree that this is a worthwhile and timely study to be conducted. However, there are various open points in need of clarification and providing details that could ensure replicability. We therefore invite you to submit a revision. Please include a response letter where you address each comment by reviewers point-by-point (including the more responses to more general RR questions that one reviewer made). To facilitate a quick turnaround, please also include a version with changes tracked/highlighted. Also please ensure that the link on your submission directly leads to the manuscript. This can be the version with highlighted changes - upon in-principle acceptance of the Stage 1 manuscript we the highlights can be removed.

One note about the reviewer comments. For clarity, it is fine to explicitly state if some analyses are not planned. However, please do not include any description of any exploratory analyses for which you will not have a detailed preregistered plan. Exploratory analyses can always be added at Stage 2, provided they are explicitly labelled as such, but they are not part of the Stage 1 protocol.

Best wishes
Sam Schwarzkopf

Reviewed by , 01 Feb 2023

Evaluation round #1

DOI or URL of the report:

Version of the report: v1

Author's Reply, 04 Jan 2023

Decision by ORCID_LOGO, posted 30 Nov 2022, validated 30 Nov 2022

Dear authors

We regularly triage Stage 1 submissions before sending them out to expert reviewers to ensure various criteria for RRs are met. Your submission is already in a great shape but there are a several smaller issues that I thought merit fixing to avoid confusing reviewers.

OSF link

Please ensure that when you submit that the OSF link points to the manuscript directly, not the general OSF project. If you change or update the manuscript, it will update the link so the link may then be broken. This issue occurred in your previous submission - our team was able to salvage the correct link but this was only by luck. Please ensure that the link to the manuscript works and points to the latest version when you submit.

Statements precluding outcome
Your manuscript is somewhat unusual for a Stage 1 RR in that there are several statements that seem to preclude the outcome. In fact, you have a whole Discussion and Conclusions section. These are fine because they can be replaced at Stage 2 (only Intro and Methods and Design is set at Stage 1). However, the second-to-last sentence in the Introduction also could be seen as precluding the outcome: "Furthermore, we argue that the initial QA/QC on unprocessed data of neuroimaging studies must be critically carried out before defacing to avoid these biases".

I realise that this is based on your pilot data and that you have a strong expectation that you will confirm those earlier results. Nevertheless, the results should not yet be known at this stage. Based on your description currently I judge the bias control level of this project to have a relative high risk Level 3 or 4 (see section 2.6 in the Guide for Authors) but your plan to use blinded, randomised rating should help mitigate this. Nevertheless, I advise you to be more circumspect in your expectations. You can certainly describe your expectations but in a way that requires no further changes to the Intro at Stage 2 if your results show the opposite.

Why only 3T data?
You say you will only use the 3T for the manual rating. There are probably good reasons for that but I would suggest explaining them.

Hypotheses 1 and 2
To my reading, the first two hypotheses are really part of the same. In RRs it is particularly useful to condense the preregistered plan down to the simplest statistical comparison (1-df test) necessary to answer the research question. In your case this seems to be a one-tailed paired t-test or non-parametric alternative on ratings between defacing statuses, plus your Bland-Aldman plots. Is the ANOVA/LMM analysis in Hypothesis 1 adding anything to that? If so, please explain.

Inconsistent power analysis
For a project like this, determining the minimal effect size for a prespecified power and alpha level makes sense. However, this seems to be inconsistently applied. For example, Figures 3 and 6 mention an alpha=0.02 but in the text and the Design Table the same power analyses are described as alpha=0.05. Moreover, it would be worth mentioning the power level in the text, not only the figure captions. Note that some RR-friendly journals expect an alpha=0.02 - if you plan to submit your final Stage 2 manuscript to one of these journals this is indeed the threshold you should set.

Minor issues

  • In first paragraph of Introduction: "...the ears themselves." The "themselves" doesn't seem to make sense to me (but I may be wrong, in which case ignore this comment)
  • Figure 4: when describing the 95% confidence intervals I assume you mean "dotted" not "dashed" lines (the latter are the means)?
  • Typo in Design Table, Hypothesis 1, Question: "bias" instead of "biases"
  • Also in Design Table, all cells of Rationale column: reported "in" Figure

Sam Schwarzkopf

User comments

No user comments yet