Recommendation

Testing antidotes to online toxicity

Chris Chambers based on reviews by Corina Logan and Marcel Martončik

A recommendation of:

STAGE 1

Responding to Online Toxicity: Which Strategies Make Others Feel Freer to Contribute, Believe That Toxicity Will Decrease, and Believe that Justice Has Been Restored?

Alison I. Young Reusser, Houghton University; Kristian Veit, Olivet Nazarene University; Lisa Gassin, Olivet Nazarene University; Jonathan Case, Houghton University https://osf.io/hfjnb version v3

Read report on server

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Responding to Online Toxicity: Which Strategies Make Others Feel Freer to Contribute, Believe That Toxicity Will Decrease, and Believe that Justice Has Been Restored?

When we encounter toxic comments online, how might individual efforts to reply to those comments improve others’ experiences conversing in that forum? Is it more helpful for others to publicly, but benevolently (with a polite tone, demonstrated understanding of the original comment, and empathy for the commenter; Young Reusser et al., 2021), correct the post? Is going along with or joking along with the commenter in a benevolent way helpful? Or is retaliating – returning toxicity for toxicity – the best strategy? Using real Reddit conversation pairs – a toxic comment followed by a reply – as stimuli, we conducted a pilot study (n = 126 participants) and propose an experiment (proposed n = 1122 participants) investigating the impact of three kinds of replies to online toxicity (benevolent correction, benevolent going-along, or retaliation) on observers’ self-reported freedom to contribute to the conversation, their belief that the toxicity will be reduced, and their overall impression that justice has been restored. We found evidence that… These findings suggest…

online toxicity, benevolence, online conversation, Reddit, empathy

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

الاستجابة للسموم عبر الإنترنت: ما هي الاستراتيجيات التي تجعل الآخرين يشعرون بحرية أكبر في المساهمة، والاعتقاد بأن السمية سوف تنخفض، والاعتقاد بأن العدالة قد تم استعادتها؟

عندما نواجه تعليقات سامة عبر الإنترنت، كيف يمكن للجهود الفردية للرد على تلك التعليقات أن تحسن تجارب الآخرين في التحدث في هذا المنتدى؟ هل من المفيد للآخرين أن يصححوا المنشور علنًا، ولكن بلطف (بنبرة مهذبة، وفهم واضح للتعليق الأصلي، وتعاطف مع المعلق؛ Young Reusser et al., 2021)؟ هل من المفيد أن تتماشى مع المعلق أو تمزح معه بطريقة خيرية؟ أم أن الانتقام ـ إعادة السمية إلى السمية ـ هو الإستراتيجية الأفضل؟ باستخدام أزواج محادثة Reddit الحقيقية - تعليق سام يتبعه رد - كمحفزات، أجرينا دراسة تجريبية (العدد = 126 مشاركًا) واقترحنا تجربة (العدد المقترح = 1122 مشاركًا) لدراسة تأثير ثلاثة أنواع من الردود على السمية عبر الإنترنت (التصحيح الخيري، أو الاستمرار الخيري، أو الانتقام) على حرية المراقبين المعلنة ذاتيًا في المساهمة في المحادثة، واعتقادهم بأن السمية سيتم تقليلها، وانطباعهم العام بأن العدالة قد تم استعادتها. لقد وجدنا أدلة على أن... تشير هذه النتائج إلى...

السمية عبر الإنترنت، الإحسان، المحادثة عبر الإنترنت، ريديت، التعاطف

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Responder a la toxicidad en línea: ¿Qué estrategias hacen que otros se sientan más libres para contribuir, crean que la toxicidad disminuirá y crean que se ha restablecido la justicia?

Cuando encontramos comentarios tóxicos en línea, ¿cómo podrían los esfuerzos individuales para responder a esos comentarios mejorar las experiencias de otros al conversar en ese foro? ¿Es más útil para otros corregir la publicación públicamente, pero con benevolencia (con un tono cortés, comprensión demostrada del comentario original y empatía por el comentarista; Young Reusser et al., 2021)? ¿Es útil estar de acuerdo con el comentarista o bromear con él de manera benévola? ¿O tomar represalias (devolver toxicidad por toxicidad) es la mejor estrategia? Utilizando pares de conversaciones reales de Reddit (un comentario tóxico seguido de una respuesta) como estímulo, llevamos a cabo un estudio piloto (n = 126 participantes) y propusimos un experimento (propuesto n = 1122 participantes) que investiga el impacto de tres tipos de respuestas en la toxicidad en línea. (corrección benévola, aceptación benevolente o represalia) en la libertad autoinformada de los observadores para contribuir a la conversación, su creencia de que la toxicidad se reducirá y su impresión general de que se ha restablecido la justicia. Encontramos evidencia de que... Estos hallazgos sugieren...

toxicidad online, benevolencia, conversación online, Reddit, empatía

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Répondre à la toxicité en ligne : quelles stratégies permettent aux autres de se sentir plus libres de contribuer, de croire que la toxicité va diminuer et de croire que la justice a été rétablie ?

Lorsque nous rencontrons des commentaires toxiques en ligne, comment les efforts individuels pour répondre à ces commentaires peuvent-ils améliorer l'expérience des autres personnes qui discutent sur ce forum ? Est-il plus utile pour les autres de corriger publiquement, mais avec bienveillance (avec un ton poli, une compréhension démontrée du commentaire original et de l'empathie pour le commentateur ; Young Reusser et al., 2021) ? Est-il utile d'accompagner ou de plaisanter avec le commentateur de manière bienveillante ? Ou est-ce que riposter – rendre toxicité pour toxicité – est la meilleure stratégie ? En utilisant de véritables paires de conversations Reddit – un commentaire toxique suivi d'une réponse – comme stimuli, nous avons mené une étude pilote (n = 126 participants) et proposé une expérience (n = 1 122 participants proposés) étudiant l'impact de trois types de réponses sur la toxicité en ligne. (correction bienveillante, accompagnement bienveillant ou représailles) sur la liberté déclarée par les observateurs de contribuer à la conversation, leur conviction que la toxicité sera réduite et leur impression globale que la justice a été rétablie. Nous avons trouvé des preuves que… Ces résultats suggèrent…

toxicité en ligne, bienveillance, conversation en ligne, Reddit, empathie

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

ऑनलाइन विषाक्तता पर प्रतिक्रिया: कौन सी रणनीतियाँ दूसरों को योगदान करने के लिए स्वतंत्र महसूस कराती हैं, विश्वास करती हैं कि विषाक्तता कम हो जाएगी, और विश्वास है कि न्याय बहाल हो गया है?

जब हम ऑनलाइन विषाक्त टिप्पणियों का सामना करते हैं, तो उन टिप्पणियों का जवाब देने के व्यक्तिगत प्रयास उस मंच पर बातचीत करने वाले दूसरों के अनुभवों को कैसे बेहतर बना सकते हैं? क्या यह दूसरों के लिए सार्वजनिक रूप से, लेकिन उदारतापूर्वक (विनम्र स्वर के साथ, मूल टिप्पणी की समझ प्रदर्शित करने और टिप्पणीकार के लिए सहानुभूति के साथ; यंग रेउसर एट अल., 2021) पोस्ट को सही करने में अधिक सहायक है? क्या टिप्पणीकार के साथ उदारतापूर्वक बातचीत करना या उसके साथ मजाक करना मददगार है? या क्या जवाबी कार्रवाई - विषाक्तता के बदले विषाक्तता लौटाना - सबसे अच्छी रणनीति है? वास्तविक रेडिट वार्तालाप जोड़े का उपयोग करना - एक विषाक्त टिप्पणी जिसके बाद एक उत्तर - उत्तेजना के रूप में, हमने एक पायलट अध्ययन (एन = 126 प्रतिभागियों) का आयोजन किया और ऑनलाइन विषाक्तता के तीन प्रकार के उत्तरों के प्रभाव की जांच करते हुए एक प्रयोग (प्रस्तावित एन = 1122 प्रतिभागियों) का प्रस्ताव रखा। (परोपकारी सुधार, परोपकारी साथ चलना, या प्रतिशोध) बातचीत में योगदान करने के लिए पर्यवेक्षकों की स्व-रिपोर्ट की गई स्वतंत्रता, उनका विश्वास कि विषाक्तता कम हो जाएगी, और उनकी समग्र धारणा कि न्याय बहाल हो गया है। हमें सबूत मिले हैं कि... ये निष्कर्ष बताते हैं...

ऑनलाइन विषाक्तता, परोपकार, ऑनलाइन बातचीत, रेडिट, सहानुभूति

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

オンラインの有害性への対応: 他の人がより自由に貢献できるようになり、有害性が減少すると信じて、正義が回復されたと信じられるのはどの戦略ですか?

オンラインで有害なコメントに遭遇したとき、それらのコメントに返信する個人の努力は、そのフォーラムでの他の人の会話体験をどのように改善できるでしょうか?他の人にとって、公の場で、しかし慈悲深く（丁寧な口調で、元のコメントの理解を示し、コメント者への共感を示して; Young Reusser et al., 2021）投稿を修正することの方が有益でしょうか?コメント投稿者に同調したり、好意的に冗談を言ったりすることは役に立ちますか?それとも、毒に毒を返す報復が最善の戦略なのでしょうか？実際の Reddit の会話ペア (有害なコメントとそれに続く返信) を刺激として使用して、パイロット研究 (参加者 n = 126) を実施し、オンラインの有害性に対する 3 種類の返信の影響を調査する実験 (参加者 n = 1122 人を予定) を提案しました。（善意の修正、善意の同調、または報復）観察者の自己申告による会話に参加する自由、有害性が軽減されるという信念、そして正義が回復されたという全体的な印象について。私たちは次の証拠を発見しました… これらの調査結果は…

オンラインの毒性、慈善、オンライン会話、Reddit、共感

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Respondendo à toxicidade online: quais estratégias fazem com que os outros se sintam mais livres para contribuir, acreditem que a toxicidade diminuirá e acreditem que a justiça foi restaurada?

Quando encontramos comentários tóxicos on-line, como os esforços individuais para responder a esses comentários podem melhorar a experiência de outras pessoas conversando nesse fórum? É mais útil que outros corrijam publicamente, mas de forma benevolente (com um tom educado, compreensão demonstrada do comentário original e empatia pelo comentador; Young Reusser et al., 2021)? Acompanhar ou brincar com o comentarista de maneira benevolente é útil? Ou será a retaliação – devolver toxicidade por toxicidade – a melhor estratégia? Usando pares reais de conversa no Reddit – um comentário tóxico seguido de uma resposta – como estímulos, conduzimos um estudo piloto (n = 126 participantes) e propomos um experimento (proposto n = 1122 participantes) investigando o impacto de três tipos de respostas na toxicidade online (correção benevolente, acompanhamento benevolente ou retaliação) na liberdade auto-relatada pelos observadores de contribuir para a conversa, na sua crença de que a toxicidade será reduzida e na sua impressão geral de que a justiça foi restaurada. Encontramos evidências de que… Essas descobertas sugerem…

toxicidade online, benevolência, conversa online, Reddit, empatia

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Реагирование на онлайн-токсичность: какие стратегии заставляют других чувствовать себя свободнее вносить свой вклад, верить, что токсичность снизится, и верить, что справедливость восстановлена?

Когда мы сталкиваемся в Интернете с токсичными комментариями, как отдельные попытки ответить на эти комментарии могут улучшить качество общения других людей на этом форуме? Будет ли более полезно для других публично, но доброжелательно (вежливым тоном, продемонстрировав понимание исходного комментария и сочувствие к комментатору; Young Reusser et al., 2021) исправить сообщение? Полезно ли подражать комментатору или подшучивать над ним в доброжелательной манере? Или ответные меры – ответная токсичность за токсичность – является лучшей стратегией? Используя реальные пары диалогов Reddit — токсичный комментарий, за которым следует ответ — в качестве стимулов, мы провели пилотное исследование (n = 126 участников) и предложили эксперимент (предполагаемое n = 1122 участника), изучающий влияние трех типов ответов на онлайн-токсичность. (доброжелательное исправление, доброжелательное согласие или возмездие) от самооценки наблюдателями свободы участвовать в разговоре, их веры в то, что токсичность будет уменьшена, и их общего впечатления, что справедливость восстановлена. Мы обнаружили доказательства того, что... Эти результаты позволяют предположить...

онлайн-токсичность, доброжелательность, онлайн-разговор, Reddit, сочувствие

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

应对网络毒性：哪些策略让其他人更自由地做出贡献，相信毒性会减少，并相信正义已经恢复？

当我们在网上遇到有毒评论时，个人回复这些评论的努力如何改善其他人在该论坛中的交谈体验？对于其他人来说，公开但仁慈地（以礼貌的语气，表现出对原始评论的理解，并对评论者表示同情；Young Reusser 等人，2021）纠正帖子是否更有帮助？以善意的方式与评论者一起或开玩笑有帮助吗？或者报复——以毒还毒——是最好的策略吗？使用真实的 Reddit 对话对（有毒评论，然后是回复）作为刺激，我们进行了一项试点研究（n = 126 名参与者），并提出了一项实验（建议 n = 1122 名参与者），调查三种回复对在线毒性的影响（善意的纠正、善意的配合或报复）观察者自我报告的参与对话的自由、他们对毒性将会减少的信念以及他们对正义已经恢复的总体印象。我们发现的证据表明……这些发现表明……

网络毒性、仁慈、在线对话、Reddit、同理心

Submission: posted 08 June 2022
Recommendation: posted 23 January 2023, validated 23 January 2023

Cite this recommendation as:
Chambers, C. (2023) Testing antidotes to online toxicity. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=215

Related stage 2 preprints:

Responding to Online Toxicity: Which Strategies Make Others Feel Freer to Contribute, Believe That Toxicity Will Decrease, and Believe that Justice Has Been Restored?
Alison I. Young Reusser, Kristian M. Veit, Elizabeth A. Gassin, and Jonathan P. Case
https://osf.io/k46e8

Recommendation

Social media is a popular tool for online discussion and debate, bringing with it various forms of hostile interactions – from offensive remarks and insults, to harassment and threats of physical violence. The nature of such online toxicity has been well studied, but much remains to be understood regarding strategies to reduce it. Existing theory and evidence suggests that a range of responses – including those that emphasise prosociality and empathy – might be effective at mitigating online toxicity. But do such strategies work in practice?

In the current study, Young Reusser et al (2023) propose an experiment to test the effectiveness of three types of responses to online toxicity – Benevolent Correction (including disagreement), Benevolent Going Along (including joking/agreement), or Retaliation (additional toxicity) – on how able participants feel to contribute to conversations, their belief that the toxicity would be reduced by the intervention, and their belief that justice had been restored. The findings promise to shed light on approaches for improving the health of online discourse.

The Stage 1 manuscript was evaluated over two rounds of in-depth review. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).

URL to the preregistered Stage 1 protocol: https://osf.io/hfjnb (under temporary private embargo)

Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.

List of eligible PCI RR-friendly journals:

References

1. Young Reusser, A. I., Veit, K. M., Gassin, E. A., & Case, J. P. (2023). Responding to Online Toxicity: Which Strategies Make Others Feel Freer to Contribute, Believe That Toxicity Will Decrease, and Believe that Justice Has Been Restored? In principle acceptance of Version 3 by Peer Community in Registered Reports. https://osf.io/hfjnb

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Evaluation round #2

DOI or URL of the report: https://osf.io/zux3y?view_only=2b45b35cf37e46e5818a40bf79fc981d

Version of the report: v2

Author's Reply, 04 Jan 2023

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.rr.100215.ar2

Decision by Chris Chambers, posted 02 Nov 2022, validated 02 Nov 2022

The two reviewers from the first round kindly returned to evaluate your revised manuscript. As you will see, both are quite positive about the revision while also noting a few remaining areas that need attention. One of the major points highlighted by both reviewers is improving the clarity of the prospective intepretration given different outcomes. As regards the comments by Marcel Martončik concerning the smallest effect size of interest (SESOI), the reviewer is correct that RRs are generally most appropriate for confirmatory studies that can specify these boundaries precisely based on prior research or theory. However, as you also note, when the SESOI is unknown then there is still a place for the RR format in providing an unbiased estimate for future studies. The key in such cases is to ensure that the interpretration given different outcomes is clear (as Corina Logan notes). For example, in the first row of the design table (p11), replace "The two other conditions do not differ" with "The two other conditions do not differ significantly", and concerning: "If the Benevolently Going Along condition’s mean is similar to either other condition, this hypothesis would be disconfirmed" -- what is the definition of "similar"? It is important to be as precise as possible so that all hypotheses are falsifiable with sources of potential intepretative bias closed off as much as possible.

Please also respond carefully to all other comments from the reviewers. I look forward to receiving your revised manuscript in due course.

https://doi.org/10.24072/pci.rr.100215.d2

Reviewed by Corina Logan, 19 Oct 2022

Dear Alison Young Reusser,

You did an excellent job of revising the manuscript, thank you for addressing the comments so well! It makes the manuscript much clearer. I have only a few comments on this version.

Table 1 > Q1 > Interpretation: “This hypothesis is agnostic as to the difference between the other two conditions”. What would your interpretation be if the Benevolent Going Along’s mean was higher than the other two conditions? An indication of how would you interpret all possible results is warranted, even if it seems like some results would be highly unlikely because this outcome is still a possibility. Specifying these in before collecting the data will make these interpretations much more robust in the event that this unlikely result occurs. The same comment applies to Q2 and Q3.

Regarding point 11 in the author’s response, if the retaliatory category is not going to be analyzed, and it looks like it won’t because it isn’t part of this manipulation check, then remove this data from the data set that is being analyzed and only include the two benevolent categories in the analysis.

Figures 2 and 3. It seems like the blue dots are there to delineate vertical lines from the x-axis? I would eliminate them to reduce confusion that they are data points. If they represent something about the data, please state this in the legends.

Page 31 “We plan to recruit 800 participants”. I believe this number is now over 1000 given the revision?

Regarding comment 26 in the author’s response, I think this was the text that was added for clarification?

“Analyses will be conducted both including the covariates (perceived toxicity of the initial comment (if it differs by condition at the .05 level), willingness to self-censor, and comfort with offensive language) and without, and the effect of condition will be reported for both.”

If so, I think an explanation should be added about why with and without the covariates are being analyzed and which set of analyses should be the ones used for coming to final conclusions.

Supplementary material at https://osf.io/wa8f3?view_only=2b45b35cf37e46e5818a40bf79fc981d, Figure 1. Please label the y-axes with language used throughout the article (Benevolent Correcting, etc) and remove the word Composite because it is confusing. Unless the word composite is important, in which case it should be explained in the legend.

All my best,

Corina Logan

https://doi.org/10.24072/pci.rr.100215.rev21

Reviewed by Marcel Martončik, 02 Nov 2022

Download the review https://doi.org/10.24072/pci.rr.100215.rev22

Evaluation round #1

DOI or URL of the report: https://osf.io/wjbvf?view_only=f310727e6bf64d38b761a942e646df25

Author's Reply, 12 Oct 2022

Download author's reply Download tracked changes file https://doi.org/10.24072/pci.rr.100215.ar1

Decision by Chris Chambers, posted 25 Aug 2022

I have now received two very detailed and constructive reviews of your submission. As you will see, the comments are extensive and identify a range of areas requiring careful revision in order to satisfy the Stage 1 criteria. Without providing an exhaustive overview, the main issues to address across both reviews are: the clarity of the research questions including the strength and clarity of the theoretical framing, the sample size justification, the precision of predictions and in particular the precision of the contingent interpretrations given different outcomes, clarifying the definition of constructs and measurements, and consideration (and clarification) of the manipulation checks. The reviewers also offer valuable suggestions for improving the clarity and structure of the presentation in key places.

Overall, based on my own reading I think your submission is promising, and if you can provide a comprehensive revision and response that satisfies the reviewers, then I believe your manuscript will eventually be suitable for Stage 1 acceptance.

https://doi.org/10.24072/pci.rr.100215.d1

Reviewed by Corina Logan, 05 Aug 2022

This Stage 1 Registered Report (RR) aims to test three hypotheses about how free participants feel in contributing to online conversations with toxic comments, and whether participants feel a specific toxic comment or situation has been addressed and resolved by a given response to that comment.

I applaud the authors for fleshing out predictions for multiple possibilities of the outcomes - it is such a great way to a priori consider how you will interpret whichever result ends up being supported and to make these alternatives part of the whole research program (rather than just discussing a favorite prediction, which might not be supported).

The RR is well developed and carefully thought out. Please see my comments below (minor and major mixed together, following the page numbers of the RR) for ways in which I think it could be clearer and for a couple of (surmountable) issues.

Abstract - state what the n=126 and n=800 refers to…the number of Reddit conversations? Or comment-reply pairs? Or people?

Page 3, par 1, sentence 1: perhaps start with a broader sentence to introduce the idea for your article and why this topic matters. Starting with the big problem that you are aiming to solve could be a good angle. And then it would make sense why you are jumping in to using Google’s codebook, definitions, and why it matters how people respond to toxic posts. Explain what API stands for.

Page 3 “While Kolhatkar and Taboada (2017) have argued that comment toxicity is unrelated to its ability to promote civil” - clarify what “its” refers to. Reddit? And clarify whether you think that responses to news articles will be different from interpersonal interactions. As a reader, I don’t know how to interpret this sentence as it relates to your research - does this study have an impact on the interpretation of your results? Or are news articles a different context and you think the responses there won’t be relevant to your context?

Page 4 - “one-on-one conversation can persuade the original commenter to change their views” - in what context? Change views about beliefs or change views about participating in an online conversation? It seems like the former because I assume that the one on one conversation happens in person? If that is the case, it would be good to make an argument about whether in person interactions apply to online interactions to predict whether this finding would apply to your research question’s online context.

Page 5 - “Are there any differences among them in how free participants feel to participate?” - differences among what? The three strategies you outlined in the previous sentence?

Page 5 - “Perhaps benevolent correction of the toxicity is the best strategy” - the best strategy for what and in what context? I can imagine that the best strategy could differ depending to the goals/motivations of the forum/commenter/observer and whether repeated interactions were required with these individuals in the future.

Page 6, Hypothesis 1a - how are “benevolent replies” different from 1b “benevolent corrections”? It seems like the latter would be a sub category of the former, but it just depends on how you categorized each term and whether there is overlap in the data that will be used to evaluate each hypothesis (i.e., all of the data from 1b is included in the 1a analysis). This becomes clear later in the RR, but I think it would be good to mention here near the beginning for clarity.

Page 7 - “had more respect for the second person if they condemned vs. empathized with the target”. I’m not clear on which condition elicited more respect for the target: if the observer had an attitude toward the target that was condemning or if they empathized with the target. Could you provide more detail?

Study design table > Interpretation given different outcomes: how will you determine whether or not there is a difference between the means?

Study design table > Q2 > rightmost column: replace “I” with “it” in “If H2a is supported, I…”

Study design table > Manipulation check - correcting > Hypothesis - should retaliatory be added to this cell? It looks like it because the retaliatory condition is in the ANOVA and in the interpretation.

Study design table > Manipulation check - toxicity > Hypothesis - “Ensure the first impression of each toxic commenter is similar across conditions.” The first impression of the participant as they participate in the experiment? Or the first impression of the experimenters who are categorizing the comments as toxic, benevolent, etc? Again, this becomes clear later on, but good to mention early in the RR to help readers follow.

Page 11 - for the interrater results, please state what test was used.

Page 11 - “The research assistants also re-rated the toxicity of each initial comment” - will you clarify how the toxic comments were classified as you did for the benevolent comments? Was a comment classified as toxic if it received a 1 or less on the benevolence scale? Or did toxic comments have their own scale? A few more details would be helpful here.

Figure 1 legend - please explain the x and y axes here, the sample sizes for each panel, what each dot represents, and what the violin shape represents. Also, a summary of the take home message would be useful. Do you need to cite the data you used here or is the data unpublished?

Page 12 - “A pdf of our Qualtrics survey and deidentified pilot data can be found…” Please indicate the file name so readers know where to find this data. I didn’t see a pdf of the Qualtrics survey at the OSF project.

Page 14, top par - how were the researcher-selected replies rated on the benevolence/retaliatory scales? If they weren’t rated, then why were these treated differently and how were they categorized?

Page 14, 2nd par - just to clarify, a “conversation pair” is a comment-reply pair? It would be good to either make sure this is clear throughout or change the term to something more intuitive.

Pilot study - throughout this section there are alphas reported, however it is not clear what they refer to - interrater reliability of a particular interpretation of, for example, the toxicity of the initial comment? Please clarify throughout and include the name of the test and a description of what the statistic represents.

Page 16 - “Social media use was included to describe our sample” How does social media use describe your sample?

Page 16 - should you list your IRB protocol number? I’m not sure how it works with studies on humans, but studies on non-humans have to list this in all articles.

Page 16 - please clarify that pair 1-12 means 4 comment-reply pairs multiplied by 3 conditions.

Page 16, last par - please show the data from the other benevolent condition as well so readers can evaluate what a “marginal” difference is.

Page 16, last par - “The effect of condition was not significant, however, given that the difference

between the retaliatory and benevolent correction conditions was marginal (planned comparison t(114) = -1.89, p = .061), we decided to control for the first impression in all multilevel analyses”

It looks like the “marginal” difference was determined based on p=0.061? If that is the case, what was your preplanned cut off for determining whether there was a difference between conditions/means/etc or not? If the cut off was p=0.05, then there is no “marginal”. It is either on one side of the threshold or not (see references below for further discussion on this topic). I realize this was for your pilot study and not your proposed study, however your decision to include first impression as a fixed effect in the analyses for the proposed study is likely based on this finding. If this is the case, because of your non-significant finding in the pilot study, the first impression should be removed from the proposed analyses.

Figures 3 & 4 legends - please clarify what the circles refer to - the means?

Pages 18-20 - “without covariates” is mentioned a few times, but I’m not sure what this means when the analyses were run with the covariates.

Page 19 - “Comfort with offensive language was not related to toxicity addressed, p = .23” Please add the rest of the test statistics here as in the other sentences.

Legends for Figures 4 & 5 and those thereafter as well - please add how to interpret the y axis. Do negative numbers mean participants felt like the toxicity was made worse, 0 = toxicity not addressed, and positive = toxicity addressed?

Given that the pilot study found no significant correlations for 2 of the 3 hypotheses, it might be a good idea to add to the study design table in the Interpretation column how you will interpret when there is no correlation and what theory this would contradict.

Also, was the pilot conducted according to the hypotheses in this RR? It would be good to note what the pilot study hypotheses were at the beginning of its section.

Page 21, pars 1 & 2 - please omit the sentences that mention “weak evidence” and “marginal” - these were not statistically significant, which is the measure you chose to determine whether there were differences or not (see my comment above and references below).

Page 24 and throughout - “This suggests that the manipulation of how benevolent and how correcting the Reddit conversations were was/was not successful” According to how I understand the experiment, I think you categorized the responses and not that you manipulated the responses or the participants. When I think of a manipulation, I think of designing the experiment such that the behavior of the participants changes across the study because of the experiment. If you agree, I would replace the term manipulation with categorization or something similar.

Page 25 - explain what ICC is and how to interpret it on first mention.

Page 26, last sentence - there is only a p value place holder; please add the rest of the statistics as in the rest of the paragraph.

Page 32 - did all co-authors approve the submitted version for publication? It looks like only one author did, however all authors need to approve of articles submitted on their behalf.

Throughout:

- the terms “benevolent going-along” and “benevolent endorsement” terms are used in different places. I would choose one term and stick with it to avoid confusion.

- the axis labels look like they are the raw variable names and would be clearer if they were relabeled to assist readers with interpretation.

>Assessing the RR according to PCI RR’s Stage 1 criteria:

>1A. The scientific validity of the research question(s).

The research questions are scientifically valid.

>1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable.

The proposed hypotheses are logical, rational, and plausible, and I suggested adding interpretations for the possibility that there are no correlations (see above).

>1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).

The methodology and analyses are feasible and I suggested a change to improve the soundness (see the comment on marginal significance above).

>1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.

The methodological detail is clear and replicable. I had a suggestion regarding the analysis pipeline to further reduce analytical flexibility (see above regarding marginal significance).

>1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).

The authors conduct categorization validation checks to ensure the three types of responses were perceived as belonging to their assigned categories.

I wish you the best of luck in conducting your study!

All my best,

Corina Logan

Max Planck Institute for Evolutionary Anthropology

References

Lybrand et al. 2021. Investigating the Misrepresentation of Statistical Significance in Empirical Articles. https://dc.etsu.edu/honors/646/

Nozzo. 12 February 2014. “Scientific method: statistical errors”. https://www.nature.com/articles/506150a

Otte et al. 2021. Almost significant: trends and P values in the use of phrases describing marginally significant results in 567,758 randomized controlled trials published between 1990 and 2020. https://doi.org/10.1101/2021.03.01.21252701

Pritschet et al. 2016. Marginally Significant Effects as Evidence for Hypotheses: Changing Attitudes Over Four Decades. https://statmodeling.stat.columbia.edu/wp-content/uploads/2016/06/Pvalues.pdf

https://doi.org/10.24072/pci.rr.100215.rev11

Reviewed by Marcel Martončik, 14 Aug 2022

Download the review https://doi.org/10.24072/pci.rr.100215.rev12

User comments

No user comments yet

or Register
Submit a report