DOI or URL of the report: https://osf.io/gj87u/?view_only=c25468dd21a245139581832cf95f55ea
Version of the report: 001_Manuscript_Revised_221013
One of the original reviewers was available to evaluate your revised manuscript, and as you will see the reviewer is satisfied with many aspects of your revision (as was I based on my own reading). You will find some remaining points to clarify concerning the rationale and methodology, including some specific aspects of study design and measurement. Provided you are able to respond to these remaining points thoroughly in a further revision and response, Stage 1 in-principle should be forthcoming without requiring further in depth review.
I thank the authors for the substantial edits made to the paper, and I think the design and background are much more clearly justified now. I have a few comments on some of the specifics below.
First line: “Across domains, older children appear to be more intentional in their actions…” How old are older children here?
Minor point: In the Abstract, I assume it is meant, assigned to one of two conditions?
p. 6 lines 214-215. Why is the effect of prompting expected to be stronger among younger children? Is that simply because the older children are expected to be more likely to generate an efficient test without prompting so there is less room for improvement? Or is there something beyond that?
Line 224 As I read this, I was curious about the operationalization of “plausible explanation” but I know that is discussed further later.
Line 227 Explore question – I assume there is an expectation that the vast majority would want to explore? Is it a problem if too many children say no? Is there a reason you are not just asking directly how they would test if they wanted to find out the truth of the claim? That is basically the same question asked in the second task, but with an opportunity for an open-ended response rather than a choice among fixed options. I also note you have no research questions explicitly looking at the responses to the Explore question, just the follow-up Design question.
Line 309 Prediction 4 also predicts an age effect/interaction.
Lines 319-328 I like the addition of the separate test of three surprising claims with multiple choice options to confirm children of all ages can accurately assess testing strategies.
Lines 406 “how certain or uncertain they were in the belief” – it is a dichotomous question of sure or not sure, not a rating, correct?
Lines 679 – would a mechanistic but physically inaccurate answer count as plausible?
Table 1 is a helpful study template.
For Q 3 in the table, do you mean Design average is the DV? Reasoning average is only for the prompted condition. But then Q 4 analyses seem to address many of the same questions, so I am a little unclear on the distinctions here.
I wondered about pilot testing of stimuli, confirming these are often surprising, though I know you’ve made reference to developing your stimuli based on previous studies, so if that is the source, perhaps that could be noted explicitly.
Overall, though, I am much more supportive of the current iteration of the proposal and think it has the potential to add to our understanding of children's developing reasoning and metacognitive skills.
DOI or URL of the report: https://osf.io/j3mtx?view_only=e5b296522fad4a7a8ce572d15ed5fac0
I have now obtained two very constructive and helpful reviews of your submission. The reviewers agree that the research question is interesting and valuable to answer. However, the evaluations then become more critical, noting substantial areas of concern that cut across the full range of the Stage 1 criteria (1A-1E). I won't list all the issues here, but based on my reading of the manuscript and reviews, the most substantial issues are clarification of hypotheses and wider rationale (which was a source of confusion for both reviewers), considering the suitability of the scaffolding manipulation for answering the research question, justification of a range of design designs, and more detailed reporting of sample size planning (and for this point please include the specific parameters of the power analysis in the manuscript, as it wasn't entirely clear to me how the comparisons associated with each hypothesis test sit within the broader architecture of multi-level modelling). Including simulated data and code for the multi-level modelling would be ideal for clarifying this with reviewers.
For a regular manuscript reporting a completed study, a set of reviews this critical would likely result in rejection, but the advantage of the Registered Reports process is that it offers the opportunity for authors to work constructively with reviewers to avoid and resolve concerns before they become roadblocks. In this case, although substantial work is needed, I believe the manuscript is sufficiently promising to invite a major revision and response, which we will be returned to the reviewers for re-evaluation.
Review of To test or not to test: Uncertainty and information seeking following surprising claims.
The proposed study aims to investigate whether improvements in uncertainty awareness or experimentation abilities drive developmental change in children’s information-seeking following surprising claims. It is an intriguing idea, and a better understanding of the mechanisms of development in children’s experimental/exploratory inquiry would be of great value to the field. Unfortunately, I was disappointed by the content of the current proposal. The descriptions of the competing hypotheses were imprecise and occasionally confusing, and I am not convinced that the experimental manipulation is appropriate for detecting differences in the constructs of interest.
Below, I explain these concerns in more detail. They are roughly in the order they occur in the manuscript, and I have included the Stage 1 Review Criteria they speak to where appropriate.
Thank you for the opportunity to read and provide feedback on this proposal.I hope my comments are helpful in clarifying the design for this genuinely intriguing line of investigation. — Elizabeth Lapidow
Introduction.
Page 2. [Criteria 1A] The introduction is vague about what differentiates older and younger children's behavior. Paragraph 3 mentions "propensity to verify a claim," but the example in paragraph 2 is about using evidence, not generating it. Given that it is the focus of the current investigation, it is necessary to explicitly define what changes and when.
Line 36-47. I suggest the authors expand and elaborate their review of the previous literature in this paragraph. In particular, the statement that "only 5-6-year-olds make use of the information they acquire" during exploration following surprising claims seems very general for only being evidenced by one study? I would also draw the authors' attention to Köksal-Tuncer & Sodian (2018), in which the majority of 3-6-year-olds generated both appropriate empirical evidence and verbal arguments to disconfirm an informant's claims.
Page 3. [Criteria 1A] I don't find the suggestion that lack of meta-cognitive awareness would result in 'single rather than comparative object exploration' intuitive. Wouldn't taking half the appropriate actions suggest that children are sensitive to their uncertainty (since they choose to act on it) but fail to complete all the necessary actions? This would seem to be the implication of sentence 51-55, but this reverses in 58-61, where the authors say that single object exploration results from a lack of awareness. It is also unclear what it would mean for children to recognize that a claim is surprising while lacking meta-cognitive awareness?
Page 3. [Criteria 1A, 1B] Despite reading the description of the Scientific-Reasoning Hypothesis several times, I am still unsure what ability the authors intend to capture. The stated definition is "the ability to design/carry out an effective empirical test," separate from whether one is meta-cognitively aware that a test is required. What ability is this? Is it the conceptual understanding of what experimentation is? Is it the executive and motor control needed to follow through a series of planned actions? This issue is further confused by the fact that "scientific reasoning" is widely used in the literature to refer to a large set of interconnected abilities -- some of which certainly require meta-cognitive awareness of uncertainty. Deana Kuhn and others argue that explicit metacognitive understanding of the relationship between belief, uncertainty, and evidence is -required- for fully developed scientific reasoning. This lack of precision in one of the two competing hypotheses critically makes it very difficult to evaluate the empirical design.
Page 4. The authors say, “Support for the Uncertainty-awareness-hypothesis comes from research showing that children’s early exploratory behavior indicates sensitivity to uncertainty.” In what way does this support the hypothesis? Wouldn’t children showing sensitivity to uncertainty in their exploration at 4-5 years old be inconsistent with a proposal that sensitivity to uncertainty is what develops to support exploration between the ages of 4 and 7?
Page 4. It is misleading to say that Lapidow et al. (2022) "found no association between reports of uncertainty and exploration decisions" since no such test of association was conducted in that study.
Page 5. [Criteria 1C] I have reservations about the proposal to use scaffolding manipulation as an indication of the development of children's abilities. This presumes improvement in performance created by scaffolding is necessarily due to an underdeveloped awareness/understanding of what is scaffolded. This presumption is particularly concerning for the relationship between Hypothesis 2 and the Strategy Scaffolding condition. If children are successful at generating responses without options offered, this would certainly be evidence that they understand how to test uncertain claims. However, generating responses (of any kind) involves demands that selecting responses does not, including verbal ability, working memory, and attention. None of these are specific to experimentation ability and could readily lead to children performing better in the Scaffolded condition irrespective of their grasp of correct experimentation. Indeed, work by Azzurra Ruggeri and colleagues shows that 4- to 5-year-olds' evaluation- of information seeking via question-asking is sensitive to the same considerations as that of older children and adults, even though they are not yet able to -generate- questions of this complexity.
As a possible suggestion for revision -- a more promising approach to this investigation might be to employ separate assessments of the two constructs of interest and see if variation in performance on these measures can account for the difference in performance between older and younger children.
Methods.
Page 6. [Criteria 1C, 1E] I may be misunderstanding, but it seems like the three conditions were always presented to children in the same order — starting with no scaffolding and ending with scaffolding on both dimensions. If this is the case, it opens up the potential for confounding order effects. In particular, I worry that repeated exposure to/opportunities to think about the task question (as the trials are only very superficially different from each other) might improve children’s performance even without the added scaffolding.
Page 7-8. The authors provide a compelling reason for changing the task from first to third-person by having participants think about what a character in a story believes/should do, rather than themselves. However, it strikes me that there may be significant differences between awareness/sensitivity to uncertainty in oneself as opposed to others. The research cited in the introduction focuses on internal monitoring and meta-cognition. Similarly, “what do you think the protagonist should do?” is very different from past work examining children’s spontaneous actions. Can the authors expand their discussion of this change to include considerations of what it would mean if they found differences between this and past work?
Page 7. [Criteria 1E] Do the authors have a plan to rule out the possibility that the Uncertainty Scaffolding prompt simply clarifies the task goal rather than scaffolding participants’ recognition/understanding of uncertainty in the story?
Page 8. [Criteria 1D] What does “encouraging children’s explicit representation of epistemic uncertainty in the trials where no scaffolding is provided” mean? Isn’t the scaffolding meant to encourage/support the representation of epistemic uncertainty?
Minor Comments.
26 - The opening sentence of the introduction seems incomplete. Presumably it is meant to say control increases with age?
44, 48, 92 - The general language used to refer to age ranges in these sentences should be revised. Given the highly developmental nature of the claims proposed, the authors need to be very specific about what ages are referred to.
159 - “and to level children’s prior knowledge…” this sentence is unclear, maybe a word is missing?
This manuscript proposes a study examining young children’s exploration and attempts at verification information they have been given. The research builds on work suggesting a developmental shift in skills, and wants to test if this development is due to children’s increased skill at effective comparisons (designing comparison tests) or at their increased skepticism about information they’re told. In particular, the task proposed is one in which participant children learn about another child who is exposed to a surprising piece of information, and then are asked the recommended next step.
I think the topic is interesting, but I have a number of issues I would like to see addressed to clarify the design and proposed analyses.
The researchers cite evidence that 5-6-year-olds who do seek new evidence after hearing a surprising claim can typically use the information. This is the study cited as evidence for a likely transition around ages 5-6, though the design is to compare 4-5-year-olds and 6-7-year-olds. I think that age choice could be more clearly explained.
The two potential explanations offered are that children get better at exploration because their metacognition improves, or because their scientific reasoning (operationalized as their ability to design an effective test) improves. In some ways, these are presented as mutually exclusive, as if both skills could not improve around the same ages. I understand the scientific reasoning hypothesis, in which good scientific reasoning ideally would lead to a clear, controlled test, and is clearly more informative than testing a single object. I am less clear about the argument behind the skepticism hypothesis. The argument as I understand it is that children who recommend a single object test might be doing so not because of a lack of understanding of scientific reasoning, but rather because they are unaware they are skeptical or should be skeptical because information provided violates their expectations/prior knowledge. I read these sections a few times, and still find the argument a little hard to follow. This argument needs to be explained and justified more clearly.
I am unclear about why the three conditions are no scaffolding, uncertainty scaffolding, and uncertainty and strategy scaffolding. I assumed when I started reading that there would be a scientific reasoning conditioning and an uncertainty condition, but that’s not the design. Is there no strategy scaffolding as its own condition because the strategy is irrelevant if children don’t recognize the uncertainty? Hypothesis 3 predicts a progressive increase from no scaffolding to uncertainty scaffolding to uncertainty and strategy scaffolding in younger children, presumably because they need help with each of the components, which suggests a more progressive development, rather than two equivalent competing hypotheses. That point should be spelled out and clarified.
In the proposed analyses, it seems a little unusual to have three conditions, and then have one test that conditions 1 & 2 show a different pattern by age, and another that conditions 2 & 3 show a different pattern by age. I understand why the hypotheses are proposed separately but is there not one test that can assess these all in one analysis, with likely post hoc follow-ups?
Coding of the intent to test – in two of the conditions, children can give an open-ended response, which will then be classified as either an intent to test, an intent to explore but not test, or something else (or a non-response). There is no mention of any assessment of inter-rater reliability of the coding, which should be included.
Also, why are you coding for multiple possibilities, but only exploring analyses that collapse the outcome responses into testing or no testing? If you plan to code it, why wouldn’t you look at that information to see what patterns may be shown by the additional types of responses? I don’t think you would necessarily need to have a clear hypothesis of what the data might look like, but ignoring part of the data also seems unnecessarily limiting.
Sample size – I am not an expert in multilevel models, but from my basic understanding, it is important to consider sample size at each of the levels, not just the overall participant sample size. At the very least, the model and its justification should be better explained and supported.
In looking at the stimuli in the appendix, I wondered about olfaction as compared to the other categories of stimuli. I realize the study participants wouldn’t have to actually manipulate objects, but asking someone to smell something unpleasant feels like something people might avoid for purely sensory reasons, independent of experimental design, in a way that would not be an issue with density and weight examples.
In sum, I think there are a lot of interesting ideas, but I am not fully convinced by the proposal as is, and would like to see better justifications and explanations for many parts of it, and perhaps some additional tweaks to the design.