This is an very comprehensive stage 1 proposal that combines computational modelling and pilot data to motivate a study of the relationship between input structure and generalisation in an online experimental paradigm with 7-8 year old English speaking children. The authors will test the hypothesis (generated from their theory of discriminative, error-based learning) that children will learn the meaning and use of Japanese spatial adpositions more effectively when there is more variability in the use of the nouns within spatial sentences. They also propose to test a number of secondary hypotheses; most notably, an empirically generated hypothesis that skewed distributions might be as good (or even better) for learning generalisations than highly variable input. The report satisfies all the necessary criteria in my opinion; the authors have done an excellent job. Below I summarise my comments under the 5 headings/areas suggested in the Guidelines for Reviewers, before finishing with some more general comments.
1A. The scientific validity of the research question(s).
The research question is scientifically valid and is detailed, with sufficient precision as to be answerable through quantitative research. The authors motivate their theoretical perspective with a literature review, and with a computational model. The study proposed falls within established ethical norms for working with children of this age.
1B. The logic, rationale, and plausibility of the proposed hypotheses, as applicable.
The proposed hypotheses are coherent and credible and are very robustly motivated. The authors motivate their primary hypotheses with a detailed, evaluative literature review, and a computational model. Secondary hypotheses are motivated empirically (on the basis of previous studies and/or pilot data). Both types of hypothesis are stated precisely and are sufficiently conceivable to be worthy of investigation. They follow either directly from the research question, or indirectly via empirical evidence (in the latter case, the analyses will yield important additional information that might lead to modifications of the theory). The pilot studies (pilot 1 and 2) are also well designed and well explained (though I have one point regarding the chance level of 25%, which I address under 1C below).
However, I would like the authors to address one point here regarding the statement (page 31) that they also plan to include measures of individual differences (e.g. attention, vocabulary) for exploratory analysis. I recognise that these are exploratory analyses, but the authors should motivate them in some way - what factors will be assess here, what relevant information might the tasks yield, why are individual differences of interest here etc. In addition, these task are not mentioned at all in the methods section (see point 1.c below).
1C. The soundness and feasibility of the methodology and analysis pipeline (including statistical power analysis or alternative sampling plans where applicable).
The study procedures and analyses are incredibly well described and are valid. Critical design features such as randomisation, rules for exclusion etc, are present and fully explained.
Please note that I have not conducted Bayes analyses myself, so my knowledge of what needs to be considered is purely theoretical. Bearing in mind that caveat, the proposed sampling plan is rigorously described, and the thresholds for evidence at different levels (strong, moderate etc) for H1 and H0 are clearly described.
However, I have three points for the authors to consider:
a. As mentioned above, the authors state on page 31 that they also plan to include measures of individual differences (e.g. attention, vocabulary) for exploratory analysis. There is no mention of these tasks at all in the methods section. If these tasks are to be included, please add the usual methodological details (e.g what the tasks will be and how the data will be collected).
b. Motivation for the choice of statistical priors. Their priors are defined on the basis of effect sizes taken from the pilot study, which was conducted in person in the children's schools. I think there is now strong evidence that data collected online tends to be nosier - and effect sizes smaller - than data collected in person. This is particular the case with studies with children, and even more so when in person data was collected in a structured environment such as a school, where there are minimal distractions. If their online study yields substantially smaller effect sizes than their pilot data, will their study still be adequately powered to find strong/moderate evidence for H1 and/or H0 for all their hypotheses?
c. Chance level. On page 35, the authors state that they will remove trials in which children make 'illegal'' moves (e.g. placing the object on a distractor cell), so that chance level is 50%. However, I'm not sure this is right. Even on trials in which children make legal moves, they still have the possibility of making an illegal move (placing an object on a distractor cell, or of choosing a distractor object). So even on legal moves (which are included in the analysis), chance is less than 50%. This doesn't (I don't think) have any profound implications for the analysis because none are comparing performance with chance (though authors should check this). But either way, if I'm right, the authors need to:
· calculate the actual chance levels across trials and state this in the paper.
· Or remove distractor objects and cells
· Or (and this is my preferred option) make it impossible to place objects on distractor cells (i.e. make distractor objects non-movable and make objects ping back to their original position if you try to place them on a distractor cell).
Note that this issue also applies to the analysis for pilot 1, where chance is set to 25% for the same reason (page 58). Again, I don't think this is right; i.e. removing trials from the analysis where children make illegal moves doesn't make any difference to the chance level on legal trials. But please tell me if I'm wrong about this - I may have misunderstood what the authors did here.
1D. Whether the clarity and degree of methodological detail is sufficient to closely replicate the proposed study procedures and analysis pipeline and to prevent undisclosed flexibility in the procedures and analyses.
The protocol certainly contains sufficient detail to be reproducible and ensure protection against research bias, and specifies precise and exhaustive links between the research question(s), hypotheses methods and results. The design summary table is very useful.
Please note that some reviewers might state that they find the introduction section overly long. It is, indeed, very comprehensive. However, I appreciate this. It lays out, very clearly, the authors; theoretical perspective, the learning mechanism they are proposing, and neatly evaluates all the relevant previous literature. As many of us are now arguing, there aren't nearly enough papers in the child development literature that really get to grips with potential mechanisms of development; i.e. we have too few papers that accurately explain, in detail, how learning processes might work. Thus, I find it admirable that the authors have prepared such a careful, detailed review. There is only one sub-section I might shorten, which is the one describing the study by Hsu & Bishop (2014). However, even here the detail can be justified given how closely that study relates to this one.
1E. Whether the authors have considered sufficient outcome-neutral conditions (e.g. absence of floor or ceiling effects; positive controls; other quality checks) for ensuring that the obtained results are able to test the stated hypotheses or answer the stated research question(s).
The proposal contains all the necessary data quality checks (though please note my worry about the statistical priors detailed in 1B above is also relevant here). Proposed statistic tests are appropriate and outcome neutral. The pilot data suggests that there are unlikely to be floor or ceiling effects. Positive controls are appropriate (high, low, skewed variability input).
Page 10, footnote: "From a theoretical perspective, we do not believe there is any good reason to expect transfer to new constructions..." I don't quite agree with this - there is good evidence for construction-general transfer in some circumstances (e.g. Abbot-Smith and Behrens' wonderful construction conspiracies'' paper (2006)).
Page 12: Both paragraphs on skewed distributions. I found it really hard to follow these two paragraphs; Paragraph 1 seems to be saying there is a skewed distribution in natural language, and paragraph 2 contradicts that. I think I know what the authors are saying but it's a bit confusing. Can they rephrase? It would also be useful to give a short definition of a 'geometric distribution' in the text, so readers don't have to read the footnote to understand what it is (footnotes should probably just include additional information, not information essential to understanding the main text).
Page 18: "word order is not captured by the model'. I wondered what consequences this had for learning, and for the comparison with real children, since children certainly do have access to word order cues. If word order *was* captured by the model, how would this change the pattern of results (if at all)? Could the authors speculate here?
Page 27: difference between simulation results (no benefit of skew) and empirical results (benefit of skew). I did wonder what the simulation results looked like earlier in the learning cycle. One possibility is that the simulations have just learned the generalisation much better than children by the end of the learning cycle. So if we want to replicate empirical results (especially from children with DLD) we might want to look at the data earlier in the learning cycle of the simulation. If we administer the test session earlier in the model's learning cycle, is there any evidence of a skew advantage?
Page 60; Table 7. Please add a description of the three "response types" to the label of table 7. I was initially confused until I realised these referred back to the "four possible moves" described on page 58.
Throughout, but especially in the sections describing the pilot, two different labels are used for the skewed input: 1) skew-bimodal/skew and 2) exponential/geometric. This can make the paper a bit different to parse (especially on pages 24 and 25 where the labels on figure 4 refer to skew-bimodal and exponential conditions, but the description of the figure in the text uses skew and geometric labels).
There are a number of typos throughout so a good proof read would be useful. I have not listed them here because of the time that would take. If the authors would like to see these, please can they send me an editable version of the manuscript (e.g. googledoc or overleaf) and I can use track changes to point them out. (NB noueni and noshitani are sometimes italicised and sometimes not).