Assessing the replicability of specific links between numeracy and decision-making
Revisiting the links between numeracy and decision making: Replication of Peters et al. (2006) with an extension examining confidence
Abstract
Recommendation: posted 03 May 2022, validated 03 May 2022
Chambers, C. (2022) Assessing the replicability of specific links between numeracy and decision-making. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=165
Related stage 2 preprints:
Recommendation
- Advances in Cognitive Psychology
- F1000Research
- Journal of Cognition
- Meta-Psychology
- Peer Community Journal
- PeerJ
- Royal Society Open Science
- Swiss Psychology Open
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Evaluation round #1
DOI or URL of the report: https://osf.io/fc5q4/
Author's Reply, 29 Apr 2022
Revised manuscript: https://osf.io/8z6ga/
All revised materials uploaded to: https://osf.io/4hjck/ , updated manuscript under sub-directory "PCI-RR submission following R&R"
Decision by Chris Chambers, posted 30 Mar 2022
Two expert reviewers have now evaluated the Stage 1 submission. The reviews are generally encouraging and the initial submission is already within striking range of meeting the Stage 1 criteria. The reviewers do, however, highlight a considerable range of areas that would benefit from refinement, from the clarification of methodological details, to strengthening the justification (and providing expanded discussion of limitations) concerning specific design decisions, to confirming the reliability of key measures, and improving the clarity of presentation. On the basis of these reviews, I am happy to invite a comprehensive revision that addresses all points.
Reviewed by Elena Rusconi, 22 Mar 2022
This is part of a larger project assessing replicability of popular findings in decision making research and involving a formative component. I applaud the scientific effort and its great formative value. I agree with the authors on the relevance of a replication of Peters et al.’s (2006) study and I am listing a number of comments/suggestions for improvement (a few minor, a few more substantial) below.
SNAPSHOT
Some of the original hypotheses have been reframed (e.g. numeracy as a continuous variable) but for a replication it may be better to use first the original formulation and list the re-framed hypothesis as an extension.
Methods (differences):
The authors are planning to recruit participants online via Amazon Mechanical Turk. This aspect is problematic as it cannot be checked if participants use online shortcuts/a calculator to answer numerical questions.
Conclusions
It is mentioned that d will be transformed to r, in order to compare with the original effect size. However, this applies to the extension (the replication results may be directly comparable, and should drive conclusions on the replication outcome).
STUDY DESIGN TABLE
From this table it becomes clear that the original analyses will be performed too. The replication outcome should be decided on the basis of those analyses.
Does the power analysis take into account the extensions or will these be regarded (and discussed) as a secondary/exploratory component of the study?
Relationship between numeracy as a continuous variable and decision making outcome/confidence: investigated via linear regression but the relationship might not be linear. Clarify that confidence judgments refer to the decision making tasks. Conclusions about the continuous variable analysis outcome should not overrun the conclusions about the replication Peters et al.’s study.
MAIN TEXT
> INTRODUCTION
The section on “affect and numeracy” could be better tuned on the manipulation of Peters et al.’s experiment 4, where the focus is on the disadvantage of being highly numerate under certain conditions.
Section “choice of study for replication”. Overall statements should be backed up more robustly.
Low power: please clarify what the target effect of interest could have been and why the specific number of participants was insufficient for each study – here or later, under each specific study sections (please note that some additional or supplementary analyses reported in the original paper may be exploratory/of secondary importance and it should be taken into account when evaluating power in the original study).
Methods: Peters et al. stated that dichotomization was introduced to address data skewedness in Study 1 and maintained it for the other experiments as well; had the sample been larger for Study 2 and Study 3, it looks like the authors would have considered other splits (such as a split based on quartiles). Please clarify if/why this procedure is not suitable and/or ideal. It would be useful to mention here whether you will calibrate your replication so that it will achieve sufficient power to test for a relation between a continuous numeracy variable and decision making biases, and that you will also replicate the original analysis for a direct comparison with Peters et al.’s study (i.e. with numeracy dichotomized).
. One of the major methodological differences between the proposed replication and the original study (online vs in person testing), should be mentioned here and discussed. Another important methodological difference (all participants will take part in all the tasks), should be also mentioned, along with its possible advantages and disadvantages.
. The extension about study-specific self-efficacy is very interesting. However, the introduction of a summary confidence judgment would certainly not interfere with the replication effort if a between-participants design was adopted and if participants were not informed about having to provide a confidence judgment at the end. Could its introduction affect decision making processes on following numerical tasks in a within-participant design, as the proposed one? (e.g. Boldt, Schiffer, Waszak & Yeung, 2019; Confidence predictions affect performance confidence and neural preparation in perceptual decision making).
> METHODS
Power analysis:
p.21 “We then conducted a power analysis using G*Power (Faul, Erdfelder, Lang, & Buchner, 2007) for the statistical tests in each of the decision-making risk paradigms separately (i.e. framing effect, frequency-percentage effect, ratio bias and bets effect).”
The four studies are not conducted with separate groups of participants. What is the familywise alpha across the four main tests? Please state whether you have included any corrections to your power calculations.
p.21 state what result (effect within the two-way between-subject ANOVA) was the one requiring a larger sample size.
p.22 Please list all the recruitment criteria and quality data checks beforehand (at the moment a list is provided but it ends with “etc.”). Please provide any threshold you will use to exclude participants after data collection, if based e.g. on Qualtrics diagnostic scores.
Table 3: The medium of Peters et al.’s study is paper and pencil, most likely in person, as the questionnaires were described as being “administered” rather than self-administered (p. 408) (perhaps this could be confirmed by Peters?). The year in which Peters et al.’s participants were tested is 2005 (if not earlier; p. 413), please amend.
> DESIGN: REPLICATION AND EXTENSION
In addition to the comment under the introduction section: the confidence rating is presented on the same page as the other questions, and all the answers can be changed after seeing/replying to the question about confidence. For a closer replication of Peters et al, the confidence rating could be required on a separate page, only after completion of a scenario/task.
Tables 4-7 (study design): should be checked and possibly reorganised to improve clarity and consistency (e.g. you could use the words “scenario” and “conditions” that are used in the main text, to achieve more clarity and distinguish between variable names and variable levels). E.g. for Study 1: IV2: Framing scenario or Frame (as in Peters et al.); IV2 – condition: Positive; IV2 – condition: Negative; also Numeracy is indicated as a between subjects IV like the Framing scenario though it is not manipulated but measured and no level can be indicated. Perhaps add a header with: manipulated variables for the first column and measures collected for the second. After reading the introduction, it is unclear whether the numeracy measure indicated in these Tables is the same as the original measure or a novel one – or both. For Study 2, IV2 should probably be indicated as Risk scenario (or Format, as in Peters et al.) rather than as “frequency-percentage effect” (condition 1: Frequency, condition 2: Percentage). Study 3 does not seem to have a manipulated IV. In Study 4 the manipulated IV could be Bet scenario (or Bet type) rather than “bet effect”.
> PROCEDURES
- The numeracy measure is always presented after the decision tasks in Peters et al. (2006) but this aspect is not preserved in the current replication, why are the authors planning to present the numeracy scales first?
- Instructions ask for participants to answer via their gut feelings/intuitions (this is partly functional to prevent them from checking their answers online). But was this prime present in the original instructions? Would it not be more ecological to let low- and high- numerate individuals choose their preferred approach?
> MANIPULATIONS
Table 8 should be updated with the deviations from protocol outlined above.
Page 32: please provide reasons why you think that dichotomization is the main weakness (over and above sample numerosity?).
Please specify whether linear regressions will be tested with both numeracy scale scores. And what is plan B, if the assumptions for linear regressions are not met by your data?
> RESULTS
Original analyses
The dichotomization was originally performed on the basis of data distribution (median split). Why do the authors plan to apply Peters et al’s thresholds rather than a median split (as in Peters et al.’s study) to their own distribution of scores? The latter might improve their chance of having even groups to compare – important, given the use of parametric stats. It could well be that their median split will overlap with that of Peters et al.’s but it could also be that their sample is more (or less) numerate on the whole (information about demographics in Peters et al. is scarce). Related to this point, a weakness in Peters et al.’s paper is the (likely) uneven distribution of experimental conditions between numeracy groups (the division in groups was performed after assigning conditions to participants) – the numbers in each group are not reported in the original paper.
New analyses
Clarify what numeracy scale you are using for regressions.
“The results of the Rasch-based numeracy scale generated by simulated data were hard to
analyze. 958 out of 1000 participants achieved zero marks and the rest of them all achieved one
mark only.”
Re: Rasch-based numeracy scale, could a list of 1000 random numbers ranging between min and max score be generated in Excel instead?
SUPPLEMENTARY MATERIALS
p.4: “Study 1” please eliminate “mixed within-subject” (this ANOVA should have 2 between-subject factors; applies to Table 1 on p.5; even though in Peters et al it is indicated as a “repeated measures” ANOVA); please eliminate or replace “with five students”; correct typo “numerach”
p.17: pls specify the topic of the “funnelling” questions.
p.32: pls clarify if participants will be able to answer from a mobile phone/whether you plan to include participants answering from a mobile phone.
Pls clarify if, after 8 minutes have passed, participants will still be able to complete or will be logged out automatically.
“The expected completion time was set at 5 minutes in advance”: Does this mean you allowed 5 extra minutes compared to the expected completion time? What age were the 30 pilot participants? How realistic is the expected completion time for younger/older participants? (age range: 0-100)
Reviewed by Daniel Ansari, 30 Mar 2022
I very much enjoyed reading this very-well written and clearly organized Stage 1 Registered Report entitled:" Revisiting the links between numeracy and decision making: replication of Peters et al. (2006) with an extension of examining confidence."
As far as I can tell this Stage 1 Registered Report meets all the necessary components of a Stage 1 Registered Report. The sampling and analyses plans are both very clear. I also thought that the justification/rationale for conducting the replication study was very clear. It was also helpful to have the results from the randomized dataset as this helps to understand what the Stage 2 manuscript will look like following the actual data collection.
Given my overall positive impression of this Stage 1 Registered Report manuscript, I only have a few concerns and suggestions for improvement.
1. While I fully agree with the authors that the hypotheses are best investigated using correlations and regressions rather than dichonomizing on the basis of numeracy, I would have liked to have seen more clear justification for this approach. The authors do not directly tell the reader why a continous approach is a better approach to handling this kind of data and research questions. I think it would be instructive if the authors provided more justification for this and pointed out the limitations/problems of a dichotomous approach and, by extension, the advantages of treating numeracy as a continous variable.
2. While the internal reliability of the two numeracy scales are reported, no such data is reported for the 4 manipulations. Is this a concern? Would we not want to know the reliability of both the independent and dependent variables? If the manipulations are unreliable this might explain any null results that might be uncovered? Related to this I was wondering how suitable the measures obtained from these manipulations are for individual differences questions, because they are derived from experimental paradigms. It has been established that measures derived from experimental research are often not well-suited for individual differences research (see: https://link.springer.com/article/10.3758/s13428-017-0935-1).