Close printable page

Recommendation

The impact of removing facial features on quality measures of structural MRI scans

D. Samuel Schwarzkopf based on reviews by Catherine Morgan and Cassandra Gould van Praag

A recommendation of:

STAGE 1

Defacing biases in manual and automated quality assessments of structural MRI with MRIQC

Céline Provins, Yasser Alemán-Gómez, Jonas Richiardi, Russell A. Poldrack, Patric Hagmann, Oscar Esteban https://osf.io/qcket version 3

Read report on server

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Defacing biases in manual and automated quality assessments of structural MRI with MRIQC

A critical requirement before data-sharing of human neuroimaging is removing facial features to protect individuals’ privacy. However, not only does this process redact identifiable information about individuals, but it also removes non-identifiable information. This may introduce undesired variability into downstream analysis and interpretation. Here, we pre-register a study design to investigate the degree to which the so-called defacing alters the quality assessment of T1-weighted images of the human brain from the openly available “IXI dataset”. The effect of defacing on manual quality assessment will be investigated on a single-site subset of the dataset (N=185). By means of repeated-measures analysis of variance (rm-ANOVA) or linear mixed-effects models in case data do not meet rm-ANOVA’s assumptions, we will determine whether four trained human raters’ perception of quality is significantly influenced by defacing by comparing their ratings on the same set of images in two conditions: “non-defaced” (i.e., preserving facial features) and “defaced”. Relatedly, we will also verify that defaced images are systematically assigned higher quality ratings. In addition, we will investigate these biases on automated quality assessments by applying multivariate rm-ANOVA (rm-MANOVA) on the image quality metrics extracted with MRIQC on the full IXI dataset (N=580; three acquisition sites). The analysis code, tested on simulated data, is made openly available with this pre-registration report. This study seeks evidence of the deleterious effects of defacing on data quality assessments by humans and machine agents.

biases, defacing, quality control, quality assessment, image quality metrics, iqms, manual ratings, MRIQC, MRI, structural MRI

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

التغلب على التحيزات في تقييمات الجودة اليدوية والآلية للتصوير بالرنين المغناطيسي الهيكلي باستخدام MRIQC

من المتطلبات الحاسمة قبل تبادل بيانات تصوير الأعصاب البشرية إزالة ملامح الوجه لحماية خصوصية الأفراد. ومع ذلك، لا تقوم هذه العملية بتنقيح المعلومات المحددة للهوية حول الأفراد فحسب، ولكنها تزيل أيضًا المعلومات غير القابلة للتعريف. قد يؤدي هذا إلى تباين غير مرغوب فيه في التحليل والتفسير. هنا، نقوم بالتسجيل المسبق لتصميم دراسة للتحقيق في الدرجة التي يغير بها ما يسمى بالتشويه تقييم جودة الصور الموزونة T1 للدماغ البشري من "مجموعة بيانات IXI" المتاحة بشكل مفتوح. سيتم دراسة تأثير التشويه على تقييم الجودة اليدوي على مجموعة فرعية من موقع واحد لمجموعة البيانات (العدد = 185). عن طريق تحليل القياسات المتكررة للتباين (rm-ANOVA) أو نماذج التأثيرات الخطية المختلطة في حالة عدم استيفاء البيانات لافتراضات rm-ANOVA، سنحدد ما إذا كان إدراك أربعة من المقيمين البشريين المدربين للجودة يتأثر بشكل كبير بالتشويه من خلال المقارنة تقييماتهم على نفس مجموعة الصور في حالتين: "غير مشوهة" (أي الحفاظ على ملامح الوجه) و"مشوهة". وعلى نحو متصل، سوف نتحقق أيضًا من أن الصور المشوهة يتم تعيينها بشكل منهجي لتصنيفات جودة أعلى. بالإضافة إلى ذلك، سوف نقوم بالتحقيق في هذه التحيزات في تقييمات الجودة الآلية من خلال تطبيق متعدد المتغيرات rm-ANOVA (rm-MANOVA) على مقاييس جودة الصورة المستخرجة باستخدام MRIQC في مجموعة بيانات IXI الكاملة (N = 580؛ ثلاثة مواقع اقتناء). كود التحليل، الذي تم اختباره على البيانات المحاكاة، أصبح متاحًا بشكل مفتوح مع تقرير التسجيل المسبق هذا. تبحث هذه الدراسة عن أدلة على الآثار الضارة للتشويه على تقييمات جودة البيانات التي يجريها البشر والوكلاء الآليون.

التحيزات، التشويه، مراقبة الجودة، تقييم الجودة، مقاييس جودة الصورة، iqms، التقييمات اليدوية، MRIQC، التصوير بالرنين المغناطيسي، التصوير بالرنين المغناطيسي الهيكلي

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Eliminación de sesgos en evaluaciones de calidad manuales y automatizadas de resonancia magnética estructural con MRIQC

Un requisito fundamental antes de compartir datos de neuroimagen humana es eliminar los rasgos faciales para proteger la privacidad de las personas. Sin embargo, este proceso no sólo elimina información identificable sobre individuos, sino que también elimina información no identificable. Esto puede introducir una variabilidad no deseada en el análisis y la interpretación posteriores. Aquí, preregistramos un diseño de estudio para investigar el grado en que la llamada desfiguración altera la evaluación de la calidad de las imágenes ponderadas en T1 del cerebro humano del "conjunto de datos IXI" disponible abiertamente. El efecto de la desfiguración en la evaluación manual de la calidad se investigará en un subconjunto del conjunto de datos de un solo sitio (N = 185). Mediante análisis de varianza de medidas repetidas (rm-ANOVA) o modelos lineales de efectos mixtos en caso de que los datos no cumplan con los supuestos de rm-ANOVA, determinaremos si la percepción de la calidad de cuatro evaluadores humanos capacitados está significativamente influenciada por la desfiguración al comparar sus calificaciones sobre el mismo conjunto de imágenes en dos condiciones: “no desfiguradas” (es decir, preservando los rasgos faciales) y “desfiguradas”. En relación con esto, también verificaremos que a las imágenes desfiguradas se les asigne sistemáticamente calificaciones de mayor calidad. Además, investigaremos estos sesgos en las evaluaciones de calidad automatizadas aplicando rm-ANOVA multivariado (rm-MANOVA) en las métricas de calidad de imagen extraídas con MRIQC en el conjunto de datos IXI completo (N = 580; tres sitios de adquisición). El código de análisis, probado con datos simulados, está disponible de forma abierta con este informe de preinscripción. Este estudio busca evidencia de los efectos nocivos de la alteración de las evaluaciones de calidad de los datos por parte de agentes humanos y mecánicos.

sesgos, desfiguración, control de calidad, evaluación de calidad, métricas de calidad de imagen, iqms, clasificaciones manuales, MRIQC, MRI, MRI estructural

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Suppression des biais dans les évaluations manuelles et automatisées de la qualité de l'IRM structurelle avec MRIQC

Une condition essentielle avant le partage de données de neuroimagerie humaine est la suppression des traits du visage afin de protéger la vie privée des individus. Cependant, non seulement ce processus supprime les informations identifiables sur les individus, mais il supprime également les informations non identifiables. Cela peut introduire une variabilité indésirable dans l’analyse et l’interprétation en aval. Ici, nous pré-enregistrons un plan d’étude pour étudier dans quelle mesure ce que l’on appelle la dégradation modifie l’évaluation de la qualité des images pondérées en T1 du cerveau humain à partir de « l’ensemble de données IXI » librement disponible. L'effet de la dégradation sur l'évaluation manuelle de la qualité sera étudié sur un sous-ensemble de site unique de l'ensemble de données (N = 185). Au moyen d'une analyse de variance à mesures répétées (rm-ANOVA) ou de modèles linéaires à effets mixtes dans le cas où les données ne répondent pas aux hypothèses de rm-ANOVA, nous déterminerons si la perception de la qualité de quatre évaluateurs humains formés est significativement influencée par la dégradation en comparant leurs notes sur le même ensemble d'images dans deux conditions : « non dégradées » (c'est-à-dire préservant les traits du visage) et « dégradées ». Dans le même ordre d’idées, nous vérifierons également que les images dégradées se voient systématiquement attribuer des notes de qualité supérieure. De plus, nous étudierons ces biais sur les évaluations automatisées de la qualité en appliquant la rm-ANOVA multivariée (rm-MANOVA) sur les métriques de qualité d'image extraites avec MRIQC sur l'ensemble de données IXI complet (N = 580 ; trois sites d'acquisition). Le code d'analyse, testé sur des données simulées, est mis à disposition librement avec ce rapport de pré-enregistrement. Cette étude recherche des preuves des effets délétères de la dégradation sur les évaluations de la qualité des données par les humains et les agents machines.

biais, dégradation, contrôle qualité, évaluation de la qualité, métriques de qualité d'image, iqms, évaluations manuelles, MRIQC, IRM, IRM structurelle

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

एमआरआईक्यूसी के साथ संरचनात्मक एमआरआई के मैनुअल और स्वचालित गुणवत्ता मूल्यांकन में पूर्वाग्रहों को दूर करना

मानव न्यूरोइमेजिंग के डेटा-साझाकरण से पहले एक महत्वपूर्ण आवश्यकता व्यक्तियों की गोपनीयता की रक्षा के लिए चेहरे की विशेषताओं को हटाना है। हालाँकि, यह प्रक्रिया न केवल व्यक्तियों के बारे में पहचान योग्य जानकारी को संशोधित करती है, बल्कि यह गैर-पहचान योग्य जानकारी को भी हटा देती है। इससे डाउनस्ट्रीम विश्लेषण और व्याख्या में अवांछित परिवर्तनशीलता आ सकती है। यहां, हम खुले तौर पर उपलब्ध "IXI डेटासेट" से मानव मस्तिष्क की T1-भारित छवियों के गुणवत्ता मूल्यांकन को किस हद तक बदल देते हैं, इसकी जांच करने के लिए एक अध्ययन डिज़ाइन को पूर्व-पंजीकृत करते हैं। मैन्युअल गुणवत्ता मूल्यांकन पर विरूपण के प्रभाव की जांच डेटासेट के एकल-साइट उपसमूह (एन=185) पर की जाएगी। यदि डेटा आरएम-एनोवा की मान्यताओं को पूरा नहीं करता है, तो विचरण (आरएम-एनोवा) या रैखिक मिश्रित-प्रभाव मॉडल के बार-बार माप विश्लेषण के माध्यम से, हम यह निर्धारित करेंगे कि क्या चार प्रशिक्षित मानव रेटर की गुणवत्ता की धारणा तुलना करके महत्वपूर्ण रूप से प्रभावित होती है। दो स्थितियों में छवियों के एक ही सेट पर उनकी रेटिंग: "गैर-विकृत" (यानी, चेहरे की विशेषताओं को संरक्षित करना) और "विकृत"। संबंधित रूप से, हम यह भी सत्यापित करेंगे कि विरूपित छवियों को व्यवस्थित रूप से उच्च गुणवत्ता रेटिंग दी गई है। इसके अलावा, हम पूर्ण IXI डेटासेट (एन = 580; तीन अधिग्रहण साइटों) पर एमआरआईक्यूसी के साथ निकाली गई छवि गुणवत्ता मेट्रिक्स पर बहुभिन्नरूपी आरएम-एनोवा (आरएम-मैनोवा) लागू करके स्वचालित गुणवत्ता आकलन पर इन पूर्वाग्रहों की जांच करेंगे। सिम्युलेटेड डेटा पर परीक्षण किया गया विश्लेषण कोड, इस पूर्व-पंजीकरण रिपोर्ट के साथ खुले तौर पर उपलब्ध कराया गया है। यह अध्ययन मनुष्यों और मशीन एजेंटों द्वारा डेटा गुणवत्ता मूल्यांकन पर विरूपण के हानिकारक प्रभावों का प्रमाण मांगता है।

पूर्वाग्रह, विकृतीकरण, गुणवत्ता नियंत्रण, गुणवत्ता मूल्यांकन, छवि गुणवत्ता मेट्रिक्स, आईक्यूएमएस, मैनुअल रेटिंग, एमआरआईक्यूसी, एमआरआई, संरचनात्मक एमआरआई

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

MRIQC による構造 MRI の手動および自動品質評価におけるバイアスを打ち破る

人間の神経画像データを共有する前の重要な要件は、個人のプライバシーを保護するために顔の特徴を削除することです。ただし、このプロセスでは個人に関する識別可能な情報が編集されるだけでなく、識別不可能な情報も削除されます。これにより、下流の分析と解釈に望ましくないばらつきが生じる可能性があります。ここでは、公開されている「IXI データセット」からの人間の脳の T1 強調画像の品質評価が、いわゆる改ざんによってどの程度変化するかを調査するための研究デザインを事前に登録します。手動品質評価に対する汚損の影響は、データセットの単一サイトのサブセット (N=185) で調査されます。データが rm-ANOVA の仮定を満たさない場合は、反復測定分散分析 (rm-ANOVA) または線形混合効果モデルを使用して、訓練された 4 人の人間の評価者の品質認識が改ざんによって大きく影響されるかどうかを比較することによって判断します。「非汚損」（つまり、顔の特徴が保存されている）と「汚損あり」という 2 つの条件で、同じ画像セットに対する評価が行われます。これに関連して、改ざんされた画像には体系的に高い品質評価が割り当てられていることも検証します。さらに、完全な IXI データセット (N=580、3 つの取得サイト) に対して MRIQC で抽出された画質メトリクスに多変量 rm-ANOVA (rm-MANOVA) を適用することで、自動品質評価におけるこれらのバイアスを調査します。シミュレートされたデータでテストされた分析コードは、この事前登録レポートで公開されます。この研究では、人間と機械エージェントによるデータ品質評価に対する改ざんの悪影響の証拠を求めています。

バイアス、改ざん、品質管理、品質評価、画質メトリクス、iqms、手動評価、MRIQC、MRI、構造 MRI

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Desfigurando preconceitos em avaliações de qualidade manuais e automatizadas de ressonância magnética estrutural com MRIQC

Um requisito crítico antes do compartilhamento de dados de neuroimagem humana é remover características faciais para proteger a privacidade dos indivíduos. No entanto, este processo não só elimina informações identificáveis sobre indivíduos, mas também remove informações não identificáveis. Isto pode introduzir variabilidade indesejada na análise e interpretação posterior. Aqui, pré-registramos um desenho de estudo para investigar até que ponto a chamada desfiguração altera a avaliação da qualidade das imagens ponderadas em T1 do cérebro humano a partir do “conjunto de dados IIX” disponível abertamente. O efeito da desfiguração na avaliação manual da qualidade será investigado em um subconjunto de site único do conjunto de dados (N = 185). Por meio de análise de variância de medidas repetidas (rm-ANOVA) ou modelos lineares de efeitos mistos, caso os dados não atendam às suposições da rm-ANOVA, determinaremos se a percepção de qualidade de quatro avaliadores humanos treinados é significativamente influenciada pela desfiguração, comparando suas classificações no mesmo conjunto de imagens em duas condições: “não desfigurado” (ou seja, preservando características faciais) e “desfigurado”. Da mesma forma, também verificaremos que as imagens desfiguradas recebem sistematicamente classificações de qualidade mais altas. Além disso, investigaremos esses vieses em avaliações automatizadas de qualidade aplicando rm-ANOVA multivariada (rm-MANOVA) nas métricas de qualidade de imagem extraídas com MRIQC no conjunto de dados IXI completo (N = 580; três locais de aquisição). O código de análise, testado em dados simulados, é disponibilizado abertamente com este relatório de pré-registro. Este estudo busca evidências dos efeitos deletérios da desfiguração nas avaliações de qualidade de dados por seres humanos e agentes de máquinas.

preconceitos, desfiguração, controle de qualidade, avaliação de qualidade, métricas de qualidade de imagem, iqms, classificações manuais, MRIQC, MRI, MRI estrutural

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Устранение систематических ошибок при ручной и автоматизированной оценке качества структурной МРТ с помощью MRIQC

Важнейшим требованием перед обменом данными нейровизуализации человека является удаление черт лица для защиты конфиденциальности людей. Однако этот процесс не только удаляет идентифицирующую информацию о людях, но и удаляет неидентифицируемую информацию. Это может внести нежелательную изменчивость в последующий анализ и интерпретацию. Здесь мы предварительно регистрируем дизайн исследования, чтобы выяснить, в какой степени так называемое искажение изменяет оценку качества Т1-взвешенных изображений человеческого мозга из общедоступного «набора данных IXI». Влияние порчи на ручную оценку качества будет исследовано на одном подмножестве набора данных (N=185). С помощью дисперсионного анализа с повторными измерениями (rm-ANOVA) или линейных моделей смешанных эффектов в случае, если данные не соответствуют предположениям rm-ANOVA, мы определим, существенно ли искажение влияет на восприятие качества четырьмя обученными людьми путем сравнения их оценки на одном и том же наборе изображений в двух условиях: «неиспорченные» (т.е. с сохранением черт лица) и «испорченные». Кроме того, мы также будем проверять, что испорченным изображениям систематически присваиваются более высокие оценки качества. Кроме того, мы будем исследовать эти систематические ошибки при автоматизированной оценке качества, применяя многомерный rm-ANOVA (rm-MANOVA) к показателям качества изображения, полученным с помощью MRIQC, на полном наборе данных IXI (N = 580; три сайта сбора данных). Код анализа, протестированный на смоделированных данных, доступен в открытом доступе вместе с этим отчетом о предварительной регистрации. В этом исследовании изучаются доказательства пагубного воздействия порчи на оценку качества данных людьми и машинными агентами.

предвзятость, искажение, контроль качества, оценка качества, метрики качества изображения, iqms, ручные оценки, MRIQC, МРТ, структурная МРТ

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

使用 MRIQC 消除结构 MRI 手动和自动质量评估中的偏差

人类神经影像数据共享之前的一个关键要求是去除面部特征以保护个人隐私。然而，此过程不仅会编辑有关个人的可识别信息，还会删除不可识别的信息。这可能会给下游分析和解释带来不希望的变化。在这里，我们预先注册了一项研究设计，以调查所谓的污损在多大程度上改变了公开可用的“IXI 数据集”中人脑 T1 加权图像的质量评估。将在数据集的单站点子集 (N=185) 上研究污损对手动质量评估的影响。在数据不满足 rm-ANOVA 假设的情况下，通过重复测量方差分析 (rm-ANOVA) 或线性混合效应模型，我们将通过比较来确定四名经过培训的人类评估者对质量的感知是否受到污损的显着影响他们在两种情况下对同一组图像的评分：“未损毁”（即保留面部特征）和“损毁”。相关地，我们还将验证损坏的图像是否被系统地分配更高的质量评级。此外，我们将通过在完整 IXI 数据集（N = 580；三个采集站点）上使用 MRIQC 提取的图像质量指标应用多元 rm-ANOVA (rm-MANOVA) 来调查自动质量评估的这些偏差。经过模拟数据测试的分析代码随此预注册报告公开提供。本研究寻求证据来证明篡改对人类和机器代理的数据质量评估产生有害影响。

偏差、污损、质量控制、质量评估、图像质量指标、iqms、手动评分、MRIQC、MRI、结构 MRI

Submission: posted 28 November 2022
Recommendation: posted 30 May 2023, validated 31 May 2023

Cite this recommendation as:
Schwarzkopf, D. (2023) The impact of removing facial features on quality measures of structural MRI scans. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=346

Recommendation

Data sharing is perhaps the most fundamental step for increasing the transparency and reproducibility of scientific research. However, the goals of open science must be tempered by ethical considerations, protecting the privacy and safety of research participants. Bridging this gap causes challenges for many fields, such as human neuroimaging. Brain images, as measured with magnetic resonance imaging (MRI), are unique to the participant and therefore contain identifying information by definition. One way to mitigate the risk to participants arising from public data sharing has been "defacing" the MRI scans, i.e., literally removing the part of the image that contains the face and surrounding tissue, while preserving the brain structure. This procedure however also removes information that is not (or at least minimally) identifiable. It also remains unclear whether defacing the images affects image quality and thus the information necessary for addressing many research questions.

The current study by Provins et al. (2023) seeks to address this question. Leveraging a publicly available "IXI dataset" comprising hundreds of T1-weighted structural MRI scans, they will assess the effect of defacing on manual and automatic estimates of image quality. Specifically, the researchers will compare image quality ratings by experts for a subset of 185 images. They hypothesise that images in which facial features have been removed are typically assigned higher quality ratings. Moreover, using a full data set of 580 images, which have been obtained across three scanning sites, they will also test the impact defacing MRI scans has on automated quality measures obtained with MRIQC software. The results of this study should have important implications for open science policy and for designing the optimal procedures for sharing structural MRI data in an ethical way. For example, if the authors' hypothesis is confirmed, studies relying on MRI quality measures might be better served by a custodianship model where identifiable data is shared under strict conditions, rather than relying on publishing defaced data. More generally, the outcome of this study may have significant legal implications in many jurisdictions.

The Stage 1 manuscript was evaluated at the inital triage stage by the Recommender and PCI:RR team, and another round of in-depth review by two experts. After a detailed response and substantial revisions, the recommender judged the manuscript met the Stage 1 criteria and awarded in-principle acceptance (IPA).

URL to the preregistered Stage 1 protocol: https://osf.io/qcket (under temporary private embargo)

Level of bias control achieved: Level 2. At least some data/evidence that will be used to answer the research question has been accessed and partially observed by the authors, but the authors certify that they have not yet observed the key variables within the data that will be used to answer the research question AND they have taken additional steps to maximise bias control and rigour.

List of eligible PCI RR-friendly journals:

References

1. Provins, C., Savary, E., Alemán-Gómez, Y., Richiardi, J., Poldrack, R. A., Hagmann, P. & Esteban, O. (2023). Defacing biases in manual and automated quality assessments of structural MRI with MRIQC, in principle acceptance of Version 3 by Peer Community in Registered Reports. https://osf.io/qcket

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Reviewed by Catherine Morgan, 09 May 2023

I am happy with all the revisions made, look forward to seeing the study results in due course.

https://doi.org/10.24072/pci.rr.100346.rev31

Evaluation round #2

DOI or URL of the report: https://osf.io/up743

Version of the report: v2

Author's Reply, 02 Apr 2023

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.rr.100346.ar2

Decision by D. Samuel Schwarzkopf, posted 08 Mar 2023, validated 09 Mar 2023

Dear authors

Your submission to PCI:RR has now been reviewed by two experts in the field. Both were impressed by your plan and agree that this is a worthwhile and timely study to be conducted. However, there are various open points in need of clarification and providing details that could ensure replicability. We therefore invite you to submit a revision. Please include a response letter where you address each comment by reviewers point-by-point (including the more responses to more general RR questions that one reviewer made). To facilitate a quick turnaround, please also include a version with changes tracked/highlighted. Also please ensure that the link on your submission directly leads to the manuscript. This can be the version with highlighted changes - upon in-principle acceptance of the Stage 1 manuscript we the highlights can be removed.

One note about the reviewer comments. For clarity, it is fine to explicitly state if some analyses are not planned. However, please do not include any description of any exploratory analyses for which you will not have a detailed preregistered plan. Exploratory analyses can always be added at Stage 2, provided they are explicitly labelled as such, but they are not part of the Stage 1 protocol.

Best wishes
Sam Schwarzkopf

https://doi.org/10.24072/pci.rr.100346.d2

Reviewed by Catherine Morgan, 01 Feb 2023

Download the review https://doi.org/10.24072/pci.rr.100346.rev21

Reviewed by Cassandra Gould van Praag, 08 Mar 2023

Download the review https://doi.org/10.24072/pci.rr.100346.rev22

Evaluation round #1

DOI or URL of the report: https://osf.io/up743?view_only=0aaeab04c8b34861b4411031e66d57e5

Version of the report: v1

Author's Reply, 04 Jan 2023

Download author's reply https://doi.org/10.24072/pci.rr.100346.ar1

Decision by D. Samuel Schwarzkopf, posted 30 Nov 2022, validated 30 Nov 2022

Dear authors

We regularly triage Stage 1 submissions before sending them out to expert reviewers to ensure various criteria for RRs are met. Your submission is already in a great shape but there are a several smaller issues that I thought merit fixing to avoid confusing reviewers.

OSF link

Please ensure that when you submit that the OSF link points to the manuscript directly, not the general OSF project. If you change or update the manuscript, it will update the link so the link may then be broken. This issue occurred in your previous submission - our team was able to salvage the correct link but this was only by luck. Please ensure that the link to the manuscript works and points to the latest version when you submit.

Statements precluding outcome
Your manuscript is somewhat unusual for a Stage 1 RR in that there are several statements that seem to preclude the outcome. In fact, you have a whole Discussion and Conclusions section. These are fine because they can be replaced at Stage 2 (only Intro and Methods and Design is set at Stage 1). However, the second-to-last sentence in the Introduction also could be seen as precluding the outcome: "Furthermore, we argue that the initial QA/QC on unprocessed data of neuroimaging studies must be critically carried out before defacing to avoid these biases".

I realise that this is based on your pilot data and that you have a strong expectation that you will confirm those earlier results. Nevertheless, the results should not yet be known at this stage. Based on your description currently I judge the bias control level of this project to have a relative high risk Level 3 or 4 (see section 2.6 in the Guide for Authors) but your plan to use blinded, randomised rating should help mitigate this. Nevertheless, I advise you to be more circumspect in your expectations. You can certainly describe your expectations but in a way that requires no further changes to the Intro at Stage 2 if your results show the opposite.

Why only 3T data?
You say you will only use the 3T for the manual rating. There are probably good reasons for that but I would suggest explaining them.

Hypotheses 1 and 2
To my reading, the first two hypotheses are really part of the same. In RRs it is particularly useful to condense the preregistered plan down to the simplest statistical comparison (1-df test) necessary to answer the research question. In your case this seems to be a one-tailed paired t-test or non-parametric alternative on ratings between defacing statuses, plus your Bland-Aldman plots. Is the ANOVA/LMM analysis in Hypothesis 1 adding anything to that? If so, please explain.

Inconsistent power analysis
For a project like this, determining the minimal effect size for a prespecified power and alpha level makes sense. However, this seems to be inconsistently applied. For example, Figures 3 and 6 mention an alpha=0.02 but in the text and the Design Table the same power analyses are described as alpha=0.05. Moreover, it would be worth mentioning the power level in the text, not only the figure captions. Note that some RR-friendly journals expect an alpha=0.02 - if you plan to submit your final Stage 2 manuscript to one of these journals this is indeed the threshold you should set.

Minor issues

In first paragraph of Introduction: "...the ears themselves." The "themselves" doesn't seem to make sense to me (but I may be wrong, in which case ignore this comment)
Figure 4: when describing the 95% confidence intervals I assume you mean "dotted" not "dashed" lines (the latter are the means)?
Typo in Design Table, Hypothesis 1, Question: "bias" instead of "biases"
Also in Design Table, all cells of Rationale column: reported "in" Figure

Sam Schwarzkopf

https://doi.org/10.24072/pci.rr.100346.d1