PCI-RR Review of “A fragmented news environment and the illusion of knowledge” by Federica Ruzzante, Folco Panizza, and Gustavo Cevolani
In this Stage 1 RR, the authors propose a study of the connection between news exposure and the knowledge illusion. In brief, the study aims to test whether exposure to news articles about different topics will increase perceived knowledge about the topics, while actual knowledge will not increase to the same extent, leading to an illusion of knowledge. Additionally, the authors propose that the emotional intensity of the topics will moderate the effects, with stronger effects for emotionally intense topics. I find the research question interesting and the design generally solid, but I have some concerns about the methodology and analysis pipeline and the methodological detail.
1A. The scientific validity of the research question(s).
I find the research question to be interesting and scientifically justifiable. The authors lay out a clear rationale for their investigation of the topic of news exposure and the illusion of knowledge. The study connects well to previous research on similar topics, and uses a clear experimental design.
1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable.
The study proposes four hypotheses: Hypothesis 1 and 3 concern main effects of exposure to news, and suggest that exposure increases perceived knowledge (H1) and therefore also illusion of knowledge (H3). These two hypotheses seem well-developed and follow logically from the reviewed literature. I also appreciate that the hypotheses are “unpacked”, as seen for instance through the different equations described at the end of p. 3 and beginning of p. 4 (it is good to specify differences in perceived knowledge separately for exposed and non-exposed topics, and that the delta for perceived knowledge for exposed topics will be larger than 0).
Hypothesis 2 and 4 concern emotional intensity as a moderator. I find these hypotheses to lack clear justification. The only discussion of the background for these hypotheses that I can find is on page 3, paragraph 3, where it is pointed out that previous studies do not control the topics used as stimuli (a good point!), and in the final sentence: “Following Park’s intuition (2001) we believe that the key characteristic that might inflate perceived knowledge is the perceived involvement of the individual, regardless of the topic being assessed: whether it is political, scientific, health-related, and so on.”
This strikes me as insufficient for proposing the emotional intensity-hypotheses. It is not clear from these general observations that the effects of exposure should be stronger for emotionally intense topics, and the authors should expand on why they propose hypotheses in this direction. There are studies on related topics, for instance this study (https://onlinelibrary.wiley.com/doi/pdf/10.1002/bdm.1836?casa_token=5tsruAIICHwAAAAA:OSeAXlRKWZhaCOlmS4M84YW9E2wvAuav7RgQn492Vhv3Ksg_IUfrQFkZs5ZrPfHSbGQZ5gWDLYnI-hQ), which makes the argument that (irrelevant) emotions during learning can inflate perceived learning. More generally, research on emotions and memory (e.g., flashbulb memories) could inform the hypotheses for the role of emotional intensity in the proposed study.
Note also that for Hypothesis 3 and 4, the term “ki” is used in the equations, as an abbreviation of illusion of knowledge. However, this term is only defined two pages later, in the description of the measures. Please introduce this term together with the equations to improve comprehension.
1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).
I have several concerns when it comes to the methodology and analysis pipeline.
There are some inconsistencies for the illusion of knowledge measure. The illusion of knowledge is stated to be calculated as “the difference between perceived knowledge at T2 and actual knowledge, that is the proportion of correct answers: ki = pkT2 – score of factual knowledge”. Perceived knowledge is measured using scale from 1 (nothing) to 100 (everything). Factual knowledge is measured as the proportion of correct answers, and so goes from 0 to 1.
To make the illusion of knowledge measure more meaningful, I think some changes need to be made. First, the perceived knowledge scale should go from 0 to 100, so that the bounds are similar between perceived and factual knowledge. As of now, a 0 score is possible for factual but not for perceived knowledge. Second, and more importantly, the two measures should both go from 0 to 100 or from 0 to 1. Otherwise, it will be harder to interpret the illusion of knowledge measure (e.g., someone who scored 50 on perceived knowledge and had 5 correct questions would receive an illusion of knowledge score of 49.5). I think converting the factual knowledge measure to a 0 to 100 scale makes most sense.
Covariates and control variables
I think the rationale behind including these variables is unclear. The authors do not describe any background about these measures, and do not describe any hypotheses for how they would influence the results. I think it should at the very least be stated explicitly that these are included for exploratory purposes, unless there are some hypotheses for them.
Furthermore, the statement that these variables “will be included as covariates and control variables” is not very specific. Will these variables be included as covariates/controls in all analyses? Or will you first test a model using only experimentally manipulated variables, and later include these as controls? There is no mention of either of these variables in the table on page 8 (PS: table number is lacking here). The role of these variables in analyses should be clearly specified. The current description opens up for analytical flexibility.
The authors also note (p. 7) that “Some extra control questions will be administered to check whether subjects had paid attention to the experimental stimuli and environment”. It would be good to specify what these control questions were, and whether they were administered at T1, at T2, or both.
No rules for data inclusion/exclusion are described, except for the mention that incomplete submissions will be deleted on page 9. I find the statement about deleting incomplete submissions to be ambiguous. I assume that a response from a participant that for instance failed to answer a single item in the social media use measure would not be deleted – but this is not clear from the manuscript. Again, to prevent analytical flexibility, the authors should be clear about what “incomplete submissions” mean. Does it restrict to main dependent variables? Is there a cut-off point (e.g., more than 5% or 10% of responses missing) where a participant will be excluded?
More generally, rules for data exclusion should be described. This also relates to the “control questions” mentioned above: will participants be included if they fail these control questions? Why? Why not?
I find the justification of the effect size to lack in detail. The current manuscript refers to an effect size of f = 0.15, stating “the effect size was adjusted based on the results obtained by Schäfer in a similar experimental protocol”. I looked briefly at the findings from Schäfer (2020), and found only one effect size, η2 = 0.01, which converts to a Cohen’s f = 0.10 (using the easystats package in R). So I wonder if I have misunderstood, if the authors are referring to a different effect size, or if something else is going on.
In general, I find this part to lack detail. The authors mention that the sample size is computed over the main and interaction effects, but this should be further explained (the necessary sample size would presumably differ between main and interaction effects).
Another question here concerns the attrition rate. I am not well-versed in studies with a 2-week lag between experimental sessions, but my gut feeling is that 15% is a low estimate of attrition. It would be nice to know whether this expected attrition rate is based on data from similar studies, is a guess, or something else.
The table on page 8/9 is helpful, but there are some issues. First, the authors plan to use ANOVAs, or a Friedman test as a non-parametric alternative if assumptions are violated. However, to my knowledge a Friedman test cannot test for an interaction in the same way as an ANOVA, and it is thus unclear how the hypotheses proposing an interaction will be analyzed in case of violated assumptions. Perhaps other alternatives such as robust ANOVA could be used instead.
Another point concerns the interpretation of non-significant findings. The authors make the following statement: “If the test will result non-significant, we cannot rule out that the difference is negligible, that is: there is no difference in the assessment of perceived knowledge of the selected topics before versus after the exposure. If so, it may be that our experiment failed to elicit such an effect, and further analysis will be then required to investigate the results, taking into account other variables.”
This is an ambiguous statement. Which further analyses are required? Do the “control variables” come into the picture here? I think the authors should look into whether equivalence testing or Bayesian analysis could be helpful in case of non-significant findings.
In general, the analysis pipeline in the current version of the manuscript is still relatively open. I think the authors should formulate a more detailed analysis plan, and ideally should provide open code for their analyses, using simulated data.
1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.
I think the study procedure is mostly described in enough detail. However, it would be helpful to have access to the full materials for the study. Note also that in the appendices, there is a mix of English and Italian when it comes to measures and topics. For better replicability, I think all materials should be available in English.
As noted above, I also think a more detailed analysis plan, preferably with code, would be very helpful.
1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).
The question about emotional involvement can be said to be a manipulation check for the emotional intensity variable. Here, one would obviously predict higher involvement for high intensity than for low intensity topics. Similarly, the baseline knowledge scores would presumably also differ between low and high knowledge topics. It would be good to specify these points in the manuscript.
Additionally, it could be good to include a manipulation check for exposure, for example by asking (after completion of other measures) which of the topics the participant (remembers) reading about in the experiment. There may be better ways to include some positive control for news exposure, but the authors should at least consider whether and how they could do this.
I think the topic is of interest and the proposed design is mostly good, but the current version lacks detail for some key aspects of methodology and analysis. I hope the authors find my review helpful.