Close printable page

Recommendation

The Representativeness Heuristic Revisited: Registered Replication Report of Kahneman and Tversky (1973)

Rima-Maria Rahal based on reviews by Peter Anthony White, Regis Kakinohana and Naseem Dillman-Hasso

A recommendation of:

STAGE 1

Representativeness heuristic in intuitive predictions: Replication Registered Report of problems reviewed in Kahneman and Tversky (1973)

Hong Ching (Bruce) Chan, Gilad Feldman https://osf.io/9cqp6 version 5

Read report on server

Abstract

EN

AR

ES

FR

HI

JA

PT

RU

ZH-CN

Representativeness heuristic in intuitive predictions: Replication Registered Report of problems reviewed in Kahneman and Tversky (1973)

[IMPORTANT: Abstract, method, and results were written using a randomized dataset produced by Qualtrics to simulate what these sections will look like after data collection. These will be updated following the data collection. For the purpose of the simulation, we wrote things in past tense, but no pre-registration or data collection took place yet.]

The representativeness heuristic is the phenomenon that people make predictions not by statistically considering prior evidence, but by the representativeness of the evidence to the target of prediction . In a Registered Report experiment with a US American sample on Prolific (N = 1309), we conducted a conceptual replication of Studies 1 and 2 and a close replication of Studies 3 to 7 from Kahneman and Tversky (1973). [The following is a demo placeholder based on the random simulated and will be updated following data collection.] We found support for the effects of [...] (effects + 95% CI). We found mixed support for the effects of [...] (effects + 95% CI). However, we failed to find support for the effects of [...] (effects + 95% CI). Extending the replication, we [found/failed to find] support for [...]. Overall, we concluded that [...]. Materials, data, and code are available on: https://osf.io/8zhcj/

representativeness, heuristic, predictions, statistical regression

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

التمثيل التجريبي في التنبؤات البديهية: تقرير النسخ المتماثل المسجل للمشكلات التي تمت مراجعتها في كانيمان وتفرسكي (1973)

[هام: تمت كتابة الملخص والطريقة والنتائج باستخدام مجموعة بيانات عشوائية أنتجتها Qualtrics لمحاكاة الشكل الذي ستبدو عليه هذه الأقسام بعد جمع البيانات. وسيتم تحديث هذه بعد جمع البيانات. لغرض المحاكاة، قمنا بكتابة الأشياء بصيغة الماضي، ولكن لم يتم إجراء أي تسجيل مسبق أو جمع بيانات حتى الآن.]

الاستدلال التمثيلي هو الظاهرة التي يقوم فيها الأشخاص بالتنبؤ ليس من خلال النظر إحصائيًا في الأدلة السابقة، ولكن من خلال تمثيل الأدلة لهدف التنبؤ. في تجربة تقرير مسجل مع عينة أمريكية أمريكية في غزير (العدد = 1309)، أجرينا تكرارًا مفاهيميًا للدراسات 1 و2 وتكرارًا وثيقًا للدراسات من 3 إلى 7 من كانيمان وتفرسكي (1973). [ما يلي هو عنصر نائب تجريبي يعتمد على المحاكاة العشوائية وسيتم تحديثه بعد جمع البيانات.] لقد وجدنا دعمًا لتأثيرات [...] (التأثيرات + 95٪ CI). لقد وجدنا دعمًا مختلطًا لتأثيرات [...] (التأثيرات + 95% CI). ومع ذلك، فشلنا في العثور على دعم لتأثيرات [...] (التأثيرات + 95% CI). بتوسيع النسخ المتماثل، [وجدنا/فشلنا في العثور على] دعم لـ [...]. وبشكل عام، خلصنا إلى أن [...]. المواد والبيانات والكود متاحة على: https://osf.io/8zhcj/

التمثيلية، الكشف عن مجريات الأمور، التنبؤات، الانحدار الإحصائي

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Heurística de representatividad en predicciones intuitivas: Informe registrado de replicación de problemas revisados en Kahneman y Tversky (1973)

[IMPORTANTE: El resumen, el método y los resultados se escribieron utilizando un conjunto de datos aleatorio producido por Qualtrics para simular cómo se verán estas secciones después de la recopilación de datos. Estos se actualizarán después de la recopilación de datos. Para los fines de la simulación, escribimos cosas en tiempo pasado, pero aún no se realizó ningún registro previo ni recopilación de datos.]

La heurística de representatividad es el fenómeno por el cual las personas hacen predicciones no considerando estadísticamente evidencia previa, sino mediante la representatividad de la evidencia para el objetivo de la predicción. En un experimento de Registered Report con una muestra estadounidense en Prolific (N = 1309), llevamos a cabo una replicación conceptual de los Estudios 1 y 2 y una replicación cercana de los Estudios 3 a 7 de Kahneman y Tversky (1973). [El siguiente es un marcador de posición de demostración basado en la simulación aleatoria y se actualizará después de la recopilación de datos.] Encontramos soporte para los efectos de [...] (efectos + IC del 95%). Se encontró apoyo mixto para los efectos de [...] (efectos + IC del 95%). Sin embargo, no se pudo encontrar apoyo para los efectos de [...] (efectos + IC del 95%). Al ampliar la replicación, [encontramos/no pudimos encontrar] soporte para [...]. En general, llegamos a la conclusión de que [...]. Los materiales, datos y códigos están disponibles en: https://osf.io/8zhcj/

representatividad, heurística, predicciones, regresión estadística

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Heuristique de représentativité dans les prédictions intuitives : rapport enregistré sur les problèmes examinés par Kahneman et Tversky (1973)

[IMPORTANT : le résumé, la méthode et les résultats ont été rédigés à l'aide d'un ensemble de données randomisées produit par Qualtrics pour simuler l'apparence de ces sections après la collecte des données. Ceux-ci seront mis à jour suite à la collecte des données. Pour les besoins de la simulation, nous avons écrit les choses au passé, mais aucune pré-inscription ni collecte de données n'a encore eu lieu.]

L'heuristique de représentativité est le phénomène selon lequel les gens font des prédictions non pas en considérant statistiquement des preuves antérieures, mais en fonction de la représentativité des preuves par rapport à la cible de la prédiction. Dans une expérience de rapport enregistré avec un échantillon américain sur Prolific (N = 1 309), nous avons effectué une réplication conceptuelle des études 1 et 2 et une réplication étroite des études 3 à 7 de Kahneman et Tversky (1973). [Ce qui suit est un espace réservé de démonstration basé sur la simulation aléatoire et sera mis à jour après la collecte des données.] Nous avons trouvé un support pour les effets de [...] (effets + IC à 95 %). Nous avons trouvé un soutien mitigé pour les effets de [...] (effets + IC à 95 %). Cependant, nous n'avons pas réussi à trouver de support pour les effets de [...] (effets + IC à 95 %). En étendant la réplication, nous avons [trouvé/impossible de trouver] la prise en charge de [...]. Dans l'ensemble, nous avons conclu que [...]. Les documents, les données et le code sont disponibles sur : https://osf.io/8zhcj/

représentativité, heuristique, prédictions, régression statistique

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

सहज ज्ञान युक्त भविष्यवाणियों में प्रतिनिधित्ववादी अनुमान: कन्नमैन और टावर्सकी (1973) में समीक्षा की गई समस्याओं की प्रतिकृति पंजीकृत रिपोर्ट

[महत्वपूर्ण: डेटा संग्रह के बाद ये अनुभाग कैसे दिखेंगे, इसका अनुकरण करने के लिए क्वाल्ट्रिक्स द्वारा उत्पादित यादृच्छिक डेटासेट का उपयोग करके सार, विधि और परिणाम लिखे गए थे। डेटा संग्रह के बाद इन्हें अपडेट किया जाएगा। सिमुलेशन के उद्देश्य से, हमने चीजों को भूतकाल में लिखा है, लेकिन अभी तक कोई पूर्व-पंजीकरण या डेटा संग्रह नहीं हुआ है।]

प्रतिनिधित्व अनुमानी वह घटना है जिसमें लोग पूर्व साक्ष्यों पर सांख्यिकीय रूप से विचार करके नहीं, बल्कि भविष्यवाणी के लक्ष्य के लिए साक्ष्य की प्रतिनिधित्वशीलता के आधार पर भविष्यवाणियां करते हैं। प्रोलिफिक (एन = 1309) पर एक अमेरिकी अमेरिकी नमूने के साथ एक पंजीकृत रिपोर्ट प्रयोग में, हमने अध्ययन 1 और 2 की एक वैचारिक प्रतिकृति और कन्नमैन और टावर्सकी (1973) से अध्ययन 3 से 7 की एक करीबी प्रतिकृति का आयोजन किया। [निम्नलिखित रैंडम सिम्युलेटेड पर आधारित एक डेमो प्लेसहोल्डर है और डेटा संग्रह के बाद इसे अपडेट किया जाएगा।] हमें [...] (प्रभाव + 95% सीआई) के प्रभावों के लिए समर्थन मिला। हमें [...] (प्रभाव + 95% सीआई) के प्रभावों के लिए मिश्रित समर्थन मिला। हालाँकि, हम [...] (प्रभाव + 95% सीआई) के प्रभावों के लिए समर्थन ढूंढने में विफल रहे। प्रतिकृति का विस्तार करते हुए, हम [...] के लिए समर्थन [पाया/ढूंढने में विफल] रहे। कुल मिलाकर, हमने निष्कर्ष निकाला कि [...]। सामग्री, डेटा और कोड यहां उपलब्ध हैं: https://osf.io/8zhcj/

प्रतिनिधित्वशीलता, अनुमानी, भविष्यवाणियाँ, सांख्यिकीय प्रतिगमन

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

直観的予測における代表性ヒューリスティック: Kahneman and Tversky (1973) で検討された問題の複製登録レポート

[重要: 要約、方法、結果は、データ収集後にこれらのセクションがどのように見えるかをシミュレートするために、クアルトリクスによって生成されたランダム化されたデータセットを使用して書かれています。これらはデータ収集後に更新されます。シミュレーションのために過去形で書いていますが、事前登録やデータ収集はまだ行われていません。]

代表性ヒューリスティックとは、人々が事前の証拠を統計的に考慮するのではなく、予測対象に対する証拠の代表性によって予測を行う現象です。 Prolific (N = 1309) に関する米国人サンプルを用いた登録報告実験では、Kahneman and Tversky (1973) の研究 1 と 2 の概念的な複製と、研究 3 から 7 の厳密な複製を実施しました。 [以下は、シミュレートされたランダムに基づくデモのプレースホルダーであり、データ収集後に更新されます。] [...] の効果 (効果 + 95% CI) のサポートが見つかりました。 [...] の効果については、さまざまな支持があることがわかりました (効果 + 95% CI)。しかし、[...] の効果 (効果 + 95% CI) の裏付けを見つけることができませんでした。レプリケーションを拡張すると、[...] のサポートが [見つかりました/見つかりませんでした]。全体として、私たちは次のように結論付けました[...]。マテリアル、データ、コードは https://osf.io/8zhcj/ から入手できます。

代表性、ヒューリスティック、予測、統計的回帰

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Heurística de representatividade em previsões intuitivas: Relatório registrado de replicação de problemas revisados em Kahneman e Tversky (1973)

[IMPORTANTE: O resumo, o método e os resultados foram escritos usando um conjunto de dados aleatório produzido pela Qualtrics para simular a aparência dessas seções após a coleta de dados. Eles serão atualizados após a coleta de dados. Para efeito da simulação, escrevemos as coisas no pretérito, mas ainda não ocorreu nenhum pré-cadastro ou coleta de dados.]

A heurística da representatividade é o fenômeno em que as pessoas fazem previsões não considerando estatisticamente evidências anteriores, mas pela representatividade das evidências para o alvo da previsão. Em um experimento de Relatório Registrado com uma amostra americana do Prolific (N = 1309), conduzimos uma replicação conceitual dos Estudos 1 e 2 e uma replicação próxima dos Estudos 3 a 7 de Kahneman e Tversky (1973). [A seguir está um espaço reservado de demonstração baseado na simulação aleatória e será atualizado após a coleta de dados.] Encontramos suporte para os efeitos de [...] (efeitos + IC 95%). Encontramos suporte misto para os efeitos de [...] (efeitos + IC 95%). No entanto, não conseguimos encontrar suporte para os efeitos de [...] (efeitos + IC 95%). Estendendo a replicação, [encontramos/não conseguimos encontrar] suporte para [...]. No geral, concluímos que [...]. Materiais, dados e código estão disponíveis em: https://osf.io/8zhcj/

representatividade, heurística, previsões, regressão estatística

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

Эвристика репрезентативности в интуитивных предсказаниях: зарегистрированный отчет о проблемах, рассмотренных Канеманом и Тверски (1973).

[ВАЖНО: Аннотация, метод и результаты были написаны с использованием рандомизированного набора данных, созданного Qualtrics для моделирования того, как эти разделы будут выглядеть после сбора данных. Они будут обновляться после сбора данных. Для моделирования мы писали слова в прошедшем времени, но предварительной регистрации или сбора данных еще не было.]

Эвристика репрезентативности — это явление, при котором люди делают прогнозы не на основе статистического рассмотрения предшествующих данных, а на основе репрезентативности данных для цели прогнозирования. В эксперименте с зарегистрированным отчетом с выборкой американцев США по Prolific (N = 1309) мы провели концептуальное повторение исследований 1 и 2 и точное повторение исследований 3–7 из Канемана и Тверски (1973). [Ниже приведен демонстрационный заполнитель, основанный на случайном моделировании, который будет обновляться после сбора данных.] Мы обнаружили поддержку эффектов [...] (эффекты + 95% ДИ). Мы обнаружили неоднозначную поддержку эффектов [...] (эффекты + 95% ДИ). Однако нам не удалось найти подтверждения эффектам [...] (эффекты + 95% ДИ). Расширяя репликацию, мы [нашли/не смогли найти] поддержку [...]. В целом мы пришли к выводу, что [...]. Материалы, данные и код доступны по адресу: https://osf.io/8zhcj/

репрезентативность, эвристика, прогнозы, статистическая регрессия

This is an automatically generated version. The authors and PCI decline all responsibility concerning its content

直观预测中的代表性启发式：Kahneman 和 Tversky (1973) 中审查的问题的复制注册报告

[重要提示：摘要、方法和结果是使用 Qualtrics 生成的随机数据集编写的，以模拟这些部分在数据收集后的样子。这些将在数据收集后更新。为了模拟的目的，我们用过去时态来写东西，但还没有进行预注册或数据收集。]

代表性启发式是指人们不是通过统计上考虑先验证据，而是通过证据对预测目标的代表性来进行预测的现象。在对 Prolific 的美国样本 (N = 1309) 进行的注册报告实验中，我们对 Kahneman 和 Tversky (1973) 的研究 1 和 2 进行了概念性复制，并对研究 3 至 7 进行了密切复制。 [以下是基于随机模拟的演示占位符，并将在数据收集后更新。] 我们发现了对 [...] 效果的支持（效果 + 95% CI）。我们发现对 [...] 效果的支持不一（效果 + 95% CI）。然而，我们未能找到对 [...] 效果的支持（效果 + 95% CI）。扩展复制，我们[发现/未能找到]对[...]的支持。总的来说，我们的结论是[...]。材料、数据和代码可在以下位置获取：https://osf.io/8zhcj/

代表性、启发式、预测、统计回归

Submission: posted 29 November 2023
Recommendation: posted 29 May 2024, validated 31 May 2024

Cite this recommendation as:
Rahal, R.-M. (2024) The Representativeness Heuristic Revisited: Registered Replication Report of Kahneman and Tversky (1973). Peer Community in Registered Reports, . https://rr.peercommunityin.org/PCIRegisteredReports/articles/rec?id=609

Recommendation

Revisiting a true classic, this registered replication report addresses Kahneman and Tversky’s (1973) introduction of the representativeness heuristic. The heuristic refers to deviations of judgments from normative evaluations of the evidence when the stimulus fits to a prototype. For instance, when an individual is described by features stereotypically associated with a certain target group (e.g., a person who attends dance training several times a week and has a passion for singing and performing), likelihood judgments that the individual belongs to a target group (K-Pop artists) compared to a non-target group (e.g., accountants) are inflated.

The impact of the original research on the field is clearly immense and long-lasting. All the better that a systematic replication attempt is being undertaken in this registered report, which addresses studies 1 through 7 of Kahneman and Tversky’s classic 1973 paper. Chan and Feldman (2024) propose a well-powered online study, in which all studies from the original article are presented to participants within-subjects. The materials are carefully constructed and closely documented in the accompanying OSF project, where in-depth information on planned data analyses is supported with a simulated dataset.

The Stage 1 manuscript was evaluated over three rounds of in-depth review. Based on detailed responses to the reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).

URL to the preregistered Stage 1 protocol: https://osf.io/er2cq

Level of bias control achieved: Level 6. Data collection commenced during the later part of Stage 1 peer review; however, since no substantive changes to the design were made after this point, the risk of bias due to prior data observation remains zero and the manuscript therefore qualifies for Level 6.

List of eligible PCI RR-friendly journals:

References

1. Chan, H. C. & Feldman, G. (2024). Representativeness heuristic in intuitive predictions: Replication Registered Report of problems reviewed in Kahneman and Tversky (1973). In principle acceptance of Version 5 by Peer Community in Registered Reports. https://osf.io/er2cq

2. Kahneman, D., & Tversky, A. (1973). On the psychology of prediction. Psychological Review, 80(4), 237–251. https://doi.org/10.1037/h0034747

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Evaluation round #3

DOI or URL of the report: https://osf.io/6fwh2

Version of the report: 4

Author's Reply, 24 May 2024

Download author's reply Download tracked changes file

Revised manuscript: https://osf.io/9cqp6

All revised materials uploaded to: https://osf.io/8zhcj/, updated manuscript under sub-directory "PCIRR Stage 1\PCI-RR submission following R&R 3"

https://doi.org/10.24072/pci.rr.100609.ar3

Decision by Rima-Maria Rahal, posted 14 May 2024, validated 14 May 2024

Thank you for the revisions you provided. In my reading of the manuscript and the response letter, only one minor issue remains to be addressed. Please clarify the wording regarding the target sample size. In the manuscript, you write that the minimum sample size is 800 participants. Because of this wording, it remains unclear when exactly sampling will be stopped and under which conditions data will be used. In the response letter you clarified that you set the target of 800 participants on Prolific, but that there will be timed-out responses that will be reimbursed but not counted towards the target sample size, and presuambly not used in the analyses. Please ensure that the wording used in the manuscript is unambigous regarding which data will be obtained and which data will and will not be used for the subsequent analyses.

https://doi.org/10.24072/pci.rr.100609.d3

Evaluation round #2

DOI or URL of the report: https://osf.io/y94ug

Version of the report: 2

Author's Reply, 11 May 2024

Download author's reply Download tracked changes file

Revised manuscript: https://osf.io/6fwh2

All revised materials uploaded to: https://osf.io/8zhcj/, updated manuscript under sub-directory "PCIRR Stage 1\PCI-RR submission following R&R 2"

https://doi.org/10.24072/pci.rr.100609.ar2

Decision by Rima-Maria Rahal, posted 10 May 2024, validated 10 May 2024

I have now received three re-reviews. There are few remaining issues to integrate:

ensure that the target sample size is clear
consider expanding the discussion on the role that replicating this particular set of studies plays for the existing literature in the area

Please consider this point in a final revision and response, and we will then be ready to move forward with Stage 1 IPA.

https://doi.org/10.24072/pci.rr.100609.d2

Reviewed by Regis Kakinohana, 05 Apr 2024

The authors addressed all my comments with detailed responses and adjustments to the manuscript. Therefore, I have no further questions or suggestions regarding Stage 1.
Best Regards.

https://doi.org/10.24072/pci.rr.100609.rev21

Reviewed by Naseem Dillman-Hasso, 30 Apr 2024

Thank you again for the opportunity to review this RR. I greatly appreciate the authors’ detailed consideration of feedback and critiques from all reviewers, with the goal of making this project stronger. I have almost no comments for this round. I believe this project is ready for data collection.

Follow ups to first round comments:

Regarding the design of the replication (i.e., all studies run by each participant), I appreciate the overview of the work your team and others has conducted. I stand corrected in terms of what the previous research shows, but I still confess that I do not like the design, but I struggle with a reasonable objection at this point besides “it’s just not what I like.” Given that, and the overwhelming evidence, I accept the current design.
I appreciate the restructuring of the manuscript, I believe it reads much better now.
Regarding using 99 as a missing data code, I still stand by my point: I would use NA for missing values, as opposed to 99. It may seem implausible for you, but given the commitment to sharing datasets publicly for reuse, it is important to consider that others may not see 99 as implausible, or may not fully read through data dictionaries. All missing values in my opinion should be replaced with “NA,” I offered 999 as an alternative option if the authors would prefer to use a specific value. It’s not our job to determine what is a reasonable age is, as you hinted in the response. Additionally, using NA for missing values makes many descriptive and analytic functions in R easier to implement for others who wish to review your datasets (i.e., mean(df$age, na.rm = TRUE) requires no additional coding of missing values).
Regarding .sav/.csv files, I understand the difficulties with CSV files and that .sav files solve many of those issues. I would still prefer a .csv file, given that individuals without SPSS or PSPP would be unable to open a .sav file on their computer easily without importing it through another software, but given the other benefits of .sav files, I am fine with this.

Second minor round comments:

Clarify the “minimum sample of 800 participants”: under what conditions will it be different? If you collect more than 800 participants, will extra data be discarded, or still used?
I would recommend updating to R version 4.4.0 and using that, given a recent major vulnerability reported: https://nvd.nist.gov/vuln/detail/CVE-2024-27322; https://hiddenlayer.com/research/r-bitrary-code-execution/

https://doi.org/10.24072/pci.rr.100609.rev22

Reviewed by Peter Anthony White, 10 Apr 2024

Comments on the revision.

The authors' replies go through my comments one by one so here are corresponding replies to that.

.1. I am happy with the revisions made in response to this.

.2 I perhaps didn't express myself very well there. I wasn't trying to argue that there wasn't any need or value in replicating the studies, only that the eventual publication should have a sound justification for it, which means setting it in the context of subsequent developments in that research area. If there is real doubt as to whether the findings would replicate or not, then certainly the replication attempt is justified, and the subsequent literature does seem to justify feeling some doubt about it. I think I just wanted the authors to have a clear idea about what it would mean if the findings were replicated, and also what it would mean if they were not replicated.

.3 The authors' reply is satisfactory - I just wanted to be sure that readers of the eventual publication would get a clear understanding from the paper of why the replication matters.

.4 O.K.

.5 I don't have any idea of a measure of confidence that would be trustworthy but I stand by my original comment that explicit judgments of confidence are prone to response biases. Replicability is not a guide to trustworthiness because it might mean only that the same response biases were operating in both the original and the replication. I'm not saying the authors shouldn't obtain confidence judgments, just that they should perhaps include some nuanced discussion of the results when they get them. What they have said in their reply to my comment is the right sort of thing, in my view.

.6 O.K.

.7 In the original submission it reads "scores supposedly either representing academic achievement, mental concentration, and sense of humour". The two choices are (i) remove "either" and (ii) change "and" to "or". The authors can go for whichever of those they prefer. Apologies for being a bit pedantic about this.

.8 O.K.

.9 O.K., that is very useful. The research I do pretty much has to be done face-to-face so I have never explored online alternatives. The use of Prolific is probably more common in some areas of psychology than others.

.10 It is the "If things fail..." that concerns me. A simple way to deal with the problem would be to analyse separately for each experiment the data from the participants for whom it was the first one they saw - at that point their judgments could not be affected by the other studies because they haven't done them yet. If the results for that sub-sample resemble those for the full sample, then no problem.

.11 O.K.

.12 O.K.

.13 I sympathise with the authors and I think their discussion of the issue is intelligent and appropriate. I agree with the decision to set alpha at .001 for exploratory analyses. For the analyses where .005 is used, I would suggest that, if they get results significant at .01 but not at .005, they could discuss these or at least list them, so that readers could get a feel for whether there is any likelihood of type 2 errors, but I'm happy for the authors to go with their own judgment on this.

.14 O.K.

Overall. I would like the authors to bear my comment on .4 in mind when writing up the results. The grammatical error commented on in .7 should be corrected. The authors should consider the suggestion for further analysis in .10 but I will leave it to them to decide whether to do it or not. They should also consider the suggestion made in .13 but again it's up to the whether they do it or not. I have no further requests for changes.

Peter White

Download the review https://doi.org/10.24072/pci.rr.100609.rev23

Evaluation round #1

DOI or URL of the report: https://osf.io/4rjpm

Version of the report: 1

Author's Reply, 26 Mar 2024

Download author's reply Download tracked changes file

Revised manuscript: https://osf.io/y94ug

All revised materials uploaded to: https://osf.io/8zhcj/, updated manuscript under sub-directory "PCIRR Stage 1\PCI-RR submission following R&R"

https://doi.org/10.24072/pci.rr.100609.ar1

Decision by Rima-Maria Rahal, posted 25 Feb 2024, validated 25 Feb 2024

Invitation to Revise: PCI RR 609

Dear Dr. Feldman,

I have now received three reviews of your submission on a replication project addressing Kahneman and Tvsersky (1973). In line with my own reading of your manuscript, the reviewers highlight important strengths of your outlined approach, but also note some areas for further improvement. In line with these suggestions, I would like to invite you to revise the manuscript.

Most salient are the need to clarify questions regarding the 7-in-1 approach of conducting multiple replication attempts in one study, regarding the sampling plan, as well as regarding the nature of the replication and evaluations of replication success. These issues fall within the normal scope of a Stage 1 evaluation and can be addressed in a careful and comprehensive round of revisions.

Warmest,

Rima-Maria Rahal

https://doi.org/10.24072/pci.rr.100609.d1

Reviewed by Regis Kakinohana, 24 Feb 2024

Review uploaded.

Download the review https://doi.org/10.24072/pci.rr.100609.rev11

Reviewed by Naseem Dillman-Hasso, 18 Feb 2024

Overall notes:

I commend the authors for taking on this large registered report replication attempt. It is very obvious that much time and consideration has been put into this project, and while PCI reviews do not evaluate the importance of submissions, my personal opinion is that this is an important replication to undertake.

At it’s current stage, I do not believe that this project is ready for data collection. I have a number of concerns, suggestions, and comments that I believe will greatly improve the end product. Even though this is my first time reviewing a registered report (although I have been involved in and am leading one right now), it is my understanding that the review process is meant to be constructive and collaborative. If it seems that any of my comments below are blunt, please do not take them as indicative of anything other than my quick jotting down of ideas.

While my point-by-point comments can be found below, I wish to draw out a few themes and mention some concerns I have.

First, I believe that the manuscript can be organized in a much clearer way. I found that there was a great deal of repeated information due to the way that sections were outlined, and a lot of jumping back and forth was required by the reader. I left some suggestions below, but largely, I would try to work on cutting a great deal of text and consolidating repeat information. One major step would be going through all of the components of any one study in order, as opposed to components (i.e. Study 1 manipulations, measures, deviations, Study 2 manipulations, measures, deviations, etc.; as opposed to Manipulations study 1, study 2, study 3, etc., Measures study 1, study 2, study 3, etc.). While I did not note all occurrences of this, I do believe restructuring the entire manuscript would help with readability and reduce total length.

Second, I think that there should be more consideration put towards sample size, sensitivity analyses, and power analyses. There was a recent article in PSPR around considerations of power (Giner-Sorolla et al., 2024). Pulling from that article, and related research, I urge the authors to consider what power and sensitivity mean. Power is the probability of detecting an effect if there is an effect there, and is effect specific rather than study specific. The same holds through with sensitivity analyses (which is truly just a different mathematical representation of the same equation): sensitivity is related to an effect not a study. Multiple comparisons and analyses all have their own sensitivity analyses (or power levels). I would encourage a consideration of how multiple comparisons and analyses for any given study may affect the reliability of the sensitivity analyses reported, and what can be done about them.

Third, I do not see this as a close replication, but rather a conceptual one. I discuss this more below but if the authors wish to argue that this is a close replication, I think there are some changes that have to be made. One of those is my fourth major concern.

Fourth, I do not love the design of this replication for two reasons. I am strongly opposed to all 6 (or seven, depending on how you want to count it) studies run by each participant. Much of the cognitive biases and heuristics literature does show how knowledge and practice reduce the effects that biases have on individuals. I would venture a guess that having participants run through a number of prediction tests may influence their answers on later questions, and even if randomizing order and looking at order effects is done, I think that this reduces the ability to detect an effect if an effect is there. Additionally, the population in question (Prolific participants) is not a naïve group. I just downloaded some of my own data from Prolific. This study included U.S. residents who were over 18 and proficient in English and were not involved in our pilot test. Out of 1226 participants, the mean number of prolific studies completed was 2194 (median 1719, SD = 1914). I think there is something to be said that this population has had more experience and exposure to heuristics and biases, and may not respond the same way. I would argue that a participant on prolific who has completed 300 studies is not representative of the population, and over 75% of participants in my sample have. While there is nothing that can be done about the non-naïvate of prolific participants, I would suggest that participants do not run through multiple studies (or only run through a subset of studies). If there is flexibility with the budget, I could even envision some participants running through only one study, some running through 2-4, and some running through all if there are interesting questions there. But I think that having participants running through all 6 studies is a bad idea.

Thank you for the opportunity to review this project. I am looking forward to seeing the next version, and to seeing this project through. If there is anything I can clarify, let me know. Additionally, please understand that everything in this review is my opinion, so take things with a grain of salt.

Minor suggestions/comments:

· Page 7: First sentence feels like it’s trying to define both representativeness heuristic (hereafter RH) and give results from 1973 study all at once. I might suggest splitting it into two sentences, one that gives a clear one-sentence definition of RH followed by the 1973 seven studies example.

· Page 8: groups in S1 are unclear, I might name them (as K&H did) to clearly indicate that the key results were between-subjects correlations between similarity and likelihood (and likelihood vs base rates). Could be read as within subjects currently.

· End of page 8/beginning of page 9: page number for direct quote would be helpful

· Page 9: As opposed to wording “representativeness heuristic predicts…”, give the actual findings, and say “…, [in]consistent with what the representativeness heuristic would predict.”

· Page 11: Sentence starting with “At the time of writing…” is a run on, I would split it up for readability

· Table 2 Exp. 6: Indicate from where N was inferred (presumably df of t-test + 1?)

· Page 21: I would choose another phrase than “Bayesian line” to describe S3. There is no one Bayesian line, maybe pull from original article wording which states “The curved line displays the correct relation according to Bayes’ rule.”

· Page 21: for S4, should read “either adjectives or reports”

· Page 21: S5 is unclear if it is between or within subjects. I would edit to read: “representing academic achievement, mental concentration, or sense of humor”

· Table 3: Geographic origin of KH 1973 participants were recruited via student paper at University of Oregon unless stated otherwise (see footnote on page 238 of original study

· I would remove all text mentioning “HIT” in the Qualtrics survey, as HIT is specific to MTurk and not Prolific.

· Related to the point above, the Qualtrics mentions “M‏Turk/Prolific/Cloudresearch” at the end. Are data being collected from multiple sources? If not, I would remove any references to other platforms.

· I would encourage consideration of any exclusion criteria (i.e. US residents 18+, proficient in English, etc.) and to include those in text, including the attention checks

· Page 49: How was the prior of 0.707 generated?

·

Major suggestions:

· Intro: I think that all of the information is there, but I don’t love how it’s organized. While reading through the first time, I found myself jumping back and forth to remind myself of what was said previously. I would suggest a reorganization along the following lines: Start by overviewing the findings from all studies in KH 1973. Something similar to table 1, but outline them all at the start. Then, move into different replication attempts (primarily around S1 and S3) and inconsistencies there and the theories for why replications are less/more successful (salience of randomness, etc.). Move into importance of replicating this specific paper, and then include the overview of replication/extension. End with a section outlining deviations/extensions in replications.

· Page 10: unclear if actually testing reproducibility, given that reproducibility is the reliability of a prior finding using the same data and same analysis (Nosek et al., 2022)

· Why was feedback accuracy not manipulated in S1/2 as it was in KH 1973? I understand the importance of including self-perceived accuracy, but I think there’s something to be said about being told that you either were or were not accurate, and how that might influence usage of representativeness on likelihood. While I get that the original study found null effects of this manipulation, I feel that it is not true to the replication to remove it due to the deception being found “unnecessary and unconvincing.” I would suggest keeping it in to remain truer to the original study (which would also allow for assurance that there is not an unequal distribution in perceived accuracy). I might have perceived accuracy first, followed by (deceptive) feedback.

· Similar to the introduction, I would consider a restructuring of the methods section to go study by study as opposed to a section for manipulations, one for measures, and one for extensions. The cognitive load of switching back and forth between studies might be lessened if it’s organized by study, with sub headers for manipulations/measures. This would also cut down significantly on the text.

· Page 43: I am unconvinced by the determination of successful/mixed/failed replication. I suggest a consideration about if there are certain studies that hold more weight or less weight, and what that might mean for representativeness heuristic. I would argue that this determination can be made almost only post-hoc, due to the vast differences in potential outcomes (7 studies, multiple analyses, etc.). Perhaps a better metric could be whether a specific study (or analysis) replicated, and whether the evidence in aggregate seems to indicate that RH as a whole replicates. I don’t have an answer on how to generate metrics for the second option (whether RH as a whole replicates) but the cutoffs given are unconvincing to me. What would happen if every study gave non-significant findings in the same direction as hypothesized? Someone might argue that power was too low but on aggregate the effect exists… Or what would happen if there are effects of the order or something of that sort?

· Page 43: I would not argue that this is a close replication by LeBel et al.’s criteria. The population is different, the context is different, the setting is different, the procedure is different, and there are differences in operationalization and stimuli. To me, this is a conceptual replication. There is nothing wrong with that, but I would represent it as it is. LeBel et al. state that a close replication is when the IV or DV stimuli are different, but a conceptual (far) replication is when the IV or DV operationalization or population is different.

· I found the discussion of power/sensitivity/sample size to be removed a bit from the analyses: sensitivity analyses do not take into account the multiple effects generated. I would suggest a careful consideration of what effect the sensitivity analysis is generated in relation to, and what to do to adjust for multiple comparisons.

Stats Comments

· I would highly recommend using the groundhog package (https://groundhogr.com/) to ensure reproducibility of all code. This would allow for version control of packages.

· I would suggest to not use “99” as a code for missing age if that is a valid age in the dataset. Consider an obviously implausible value such as “999”, or commonly used metrics scuh as “NA” or “.”

· I would recommend installing tidyverse instead of individually ggplot2, haven, kintr, dplyr, purr, etc.

· If the data is collected via Qualtrics, why is it imported via a .sav file as opposed to an open format such as .csv?

References:

Giner-Sorolla, R., Montoya, A. K., Reifman, A., Carpenter, T., Lewis Jr, N. A., Aberson, C. L., ... & Soderberg, C. (2019). Power to detect what? Considerations for planning and evaluating sample size. Personality and Social Psychology Review, 10888683241228328.

Nosek, B. A., Hardwicke, T. E., Moshontz, H., Allard, A., Corker, K. S., Dreber, A., ... & Vazire, S. (2022). Replicability, robustness, and reproducibility in psychological science. Annual review of psychology, 73, 719-748.

https://doi.org/10.24072/pci.rr.100609.rev12

Reviewed by Peter Anthony White, 23 Feb 2024

Download the review https://doi.org/10.24072/pci.rr.100609.rev13