DOI or URL of the report: https://osf.io/5qjuk/?view_only=6157272caf124259b467900f6f0664b3
Version of the report: v2
Dear editor and reviewers,
Again, thank you for your detailed review of the manuscript and for all the comments. We have taken all the comments, concerns and suggestions into account and addressed them in the revised manuscript. First, we have addressed your concern relating to the infants’ age and reading as an activity. Here, we have decided to exclude fathers that have not engaged in shared book reading activity at all since the infant was born. Our previous and current experience in the lab and with parents suggests that parents do read to their infants and that this exclusion criteria should not exclude too many fathers. Furthermore, we would like to clarify that the picture book used in the study is not a book per se, but more like a picture book with short dialogues meant to elicit the target words. Second, we included more information about how we will interpret the ICC in the current study, as well as more information on the protocol for making the source recordings for the infant preference task. Finally, we have edited some paragraphs to make them clearer and corrected some errors that was raised by the reviewers.
Please see the point-by-point responses (in the revised manuscript with track changes) to the reviewer’s comments and concerns, as well as the highlighted text in the manuscript for edits and added text.
We believe that we have addressed the concerns and issues raised by the reviewers and that it has resulted in an improved manuscript. We would like to thank you for your time and feedback, and we are looking forward to hearing from you.
Yours sincerely,
The authors
Your revised manuscript has now been evaluated by the three reviewers. I'm happy to see good progress, but there are some remaining issues to iron out before IPA. In my own reading of both the reviews and the manuscript, core issues to address include residual concerns surrounding (1) measurement reliability, (2) crucial methodological detail, and (3) age of the infant sample. Please respond comprehensively to all issues in a revised manuscript and response. Provided you are able to fully address all of the points raised by the reviewers, we may then be able to issue IPA without further in-depth review.
The authors were very responsive to the concerns raised by myself and the other reviewers. Based on a reviewer comment, the paper now includes a criterion that fathers must have read to their child in the past two weeks. I am a bit concerned that this might exclude too many fathers, although the authors do say that they have previous data suggesting that most fathers read to their children. If indeed this criterion proves overly strict, the authors might justifiably deviate from the pre-registration, assuming that this decision is made indepentent of results. Overall, I look forward to seeing the results of the study once it is complete.
Very small wording suggestions:
p. 20, tranlsation of item "Parents may learn babies to talk by talking with them" could be double checked - might be more apt to say "Parents may teach babies to talk by talking to them"
p. 20, "reverted" --> "reversed"?
I appreciate the authors’ responses to prior feedback. Below I outline some remaining concerns/suggestions:
1. I still find the paragraph (beginning line 286) a bit oddly structured. There seem to be 3 themes – characteristics of Norwegian IDS, cross-cultural/linguistic variation in IDS, and whether vowel space hyperarticulation occurs in IDS. These topics are intersecting but separable and the paragraph in its current form still seems to jump from one to the other.
2. “it is not fully known whether fathers modulate their IDS when speaking with a child” (line 468). I still feel like this overplays the extent to which we do not know about this. At this point there is robust evidence that fathers do engage in IDS – although there is plenty still to debate and discover with respect to how much relative to mothers and/or differences in quality/kind.
3. I am not sure “apply[ing] a conservative approach while interpreting the results” (line 504) solves the problem of the possibility that there is not enough reliability in comparing across individuals to be able to pull out the desired results (particularly with respect to the effect of infant experience with father caregiving). I think this is a really interesting and valuable study, but I still worry that there is a strong possibility that this result will be null and given what we are learning about the lack of test-retest reliability in measures of infant preference, that won’t be interpretable (i.e. while the group level preference for IDS may be robustly measurable, individual differences may not be). The authors state that they will report the ICC, but more information on how they will interpret this measure would be helpful.
4. Small comment: “Parents may _learn_ babies to talk…” (line 629) I assume “teach” is meant?
5. More information is needed about how the stimuli for the infant preference study will be created. Who will be recorded? Will the exact ManyBabies procedure (i.e. pulling objects out of a bag, etc.) be followed? Etc.
6. “each utterance contains 8 words” (line 722) – this is not consistent with the ManyBabies protocol. Please explain?
7. “The recordings in IDS and ADS” (line 743) I assume this is referring to the father’s recordings, not the experimental stimuli, but perhaps best to explicitly state that for clarity.
8. I do not see where it is described how the vowel triangle measure will be calculated from the individual formant information. Since it is mentioned that F3 will be measured, will this be included in the analysis in some form, or just F1/F2? Typically vowel space is calculated as the area between 3 point vowels across F1 and F2, but this is not explicitly stated anywhere and there are other details that need to be articulated. [this may have been an issue in the original submission as well, apologies for not bringing this up earlier if so]
9. In the exploratory analysis, Paternal attitudes and Paternal reading practices were introduced to the model as main effects. Shouldn’t these be interaction effects with register, as with paternity leave duration? I.e. the prediction would be that they would increase the difference between IDS and ADS in the various acoustic measures, not that they would increase overall f0 etc.?
DOI or URL of the report: https://osf.io/w7k2b?view_only=af30057f71474783a6d7629b985fa4b1
Three expert reviewers have now assessed the Stage 1 manuscript. As you will see, the evaluations are broadly positive about your proposal while also offering a range of points for consideration and constructive suggestions for improvement. The three most substantive issues to address in revision are (1) considering whether the sample size and measurement reliability are sufficient to avoid futility in the analysis of individual differences, (2) resolving the potentially confounding role of paternal knowledge, attitudes, and beliefs, and (3) distinguishing measurement of paternal caregiving from amount of experience reading to the child. The reviewers also ask for consideration of additional literature in the Introduction, additional methodological details, justification of trial numbers, clarification of procedural contingencies, and ensuring that proposed procedures are reproducible from the description in the manuscript (rather than relying on secondary references to previous studies/methods).
Overall, based on these reviews and my own reading of the manuscript, I believe the study is a promising candidate for eventual Stage 1 acceptance and I am happy to invite a revision and response.
Overall, this is a very strong Stage 1 registered report of a study investigating paternal and male infant-directed speech. As the authors point out, there has been insufficient work on this topic, and the unique parental leave policies in Norway provide an interesting test of several important theoretical questions. The introduction outlines the relevant literature and clearly articulates study hypotheses. The methods are strong, and have been used in previous high-quality work. The inclusion of both acoustic measures of fathers’ speech and infant behavioral measures is another strength. In reading the introduction several times I worried about birth order effects, but was reassured when the methods specified that all children would be first-born. It might be useful to mention this design detail earlier in the paper. The authors have provided power analyses, and the study is generally adequately powered. The planned analyses are appropriate to answer the research question, and the document has specified inclusion/exclusion criteria, plans if data are non-normal etc.
My main concern is with the individual differences analysis, examining whether the duration of paternity leave (presumably a proxy for infant exposure to paternal IDS) predicts infants’ preference for IDS. The authors mention that there “is no known effect size of the interaction between the IDS/AADS differences and the duration of paternity leave”, and this is true. However, one issue that has not been considered here is measurement reliability of the infant task – how stable are individual differences as measured by the IDS-ADS preference task?
In our recent paper (Byers-Heinlein, Bergmann, & Savalei, 2022), we calculate the internal consistency of the preference as reported in the ManyBabies Consortium (2020) paper to be .14, and it is important for the power calculations to account for attenuation of correlation due to unreliability. The observed correlation can be calculated from reliabilities based Spearman’s (1904) formula robserved = rtrue sqrt(rxx * ryy) . Assuming the best case scenario that infants’ exposure to paternal IDS is measured with perfect reliability by duration of paternity leave (itself a questionable assumption), if the true relationship between leave duration and preference is perfect (rtrue =1), then robserved = .36; if rtrue = .7,then robserved = .26; if If rtrue = .3,then robserved = .11. The authors will definitely want to run their own calculations and apply them to their linear mixed regression approach, however, I suspect that at any reasonable sample size of infants (besides something on the order of ManyBabies) it will not be possible to detect individual differences as measured by this looking time task, given its low reliability. It is certainly an interesting research question and a plausible hypothesis, but without a more reliable infant task, it may not be possible to answer.
Byers-Heinlein, K., Bergmann, C., & Savalei, V. (2022). Six solutions for more reliable infant research. Infant and Child Development, e2296. http://doi.org/10.1002/icd.2296