DOI or URL of the report: https://doi.org/10.31234/osf.io/c2dba_v4
Version of the report: 4
Dear Dr Savage and co-authors,
Thank you for your revised submission and for responding to feedback. The methods section is greatly improved in terms of having more information about this research, rather than having to read methods sections from two other articles to get the full picture. And the introduction is more fleshed out in terms of what is driving this research and what it means. Now that these details are included in the manuscript, it is easier to assess the proposed work.
The main issue now is that the Stage 1 reads like an outline that is addressing points in the author guidelines at PCI RR, rather than being a full article that follows the author guidelines without explicitly discussing the guidelines. In its current state, it is unlikely that authors will be able to copy and paste the intro and methods into their Stage 2s without major revisions to the text, which is not permitted at Stage 2 (see section 2.10 in the PCI RR author guidelines https://rr.peercommunityin.org/help/guide_for_authors#h_97949820420921613309536944). For example, I appreciate that you made the section “Equitable coauthorship in global collaboration”. However, now that I see it in the context of this draft, I think it should be put into supplementary material because it is a commentary on your workflow and how you will divide up the labor that goes into the Stage 2 articles - the process behind the articles. This is useful for people to know because you are developing a new way of working together, but this is not the focus of your research and I think it distracts from the research in the main text.
Other elements that make the Stage 1 look more like an outline than a final draft are along the following lines… Referring to PCI RR’s programmatic track in the abstract and intro should be removed, and instead discuss how the outputs will be conducted and how this will benefit the research. Instructions to recommenders and reviewers can be put in author responses and cover letters, or you might want to put this in supplementary material because you are developing a new way of working together that others might want to use. Table 1 should be included in the main text (maybe in the methods?) rather than as a separate page that has no section title between the abstract and introduction (also landscape orientation for this table would make it fit better on the page). The “Hypothesis” section should be incorporated into the regular text without the word “Hypothesis” denoting it as a separate section. The Hypothesis section has methods in it that should be moved to the methods section. Explicit references to things like PCI RR’s criterion 2D should be removed because the content in the intro and methods section should state what will be done (which will be in accordance with this criterion, but without explicitly pointing it out). In the methods section, don’t refer to Stage 2s or first authors of Stage 2 articles and so on, just say what the minimum sample size is for each population/site (whatever term you want to use) and how each population will be coded (e.g., line 228). There are many other elements along these lines, including in the methods section, but hopefully this feedback gives you enough of an idea to be able to change the rest of these elements.
In other words, this Stage 1 should be about the research you are proposing to conduct and not a discussion about your workflows (you can put the latter in supplementary material). Think of this Stage 1 as being the final Stage 2 article (but without the results and discussion sections) - this will be what ends up as the final abstract and intro and methods (of course, modified for each Stage 2 article). I have several further comments; please see below.
The track changes file doesn’t show individual insertions and deletions, and some changes don’t show up at all. For example, it looks like the entire abstract and whole paragraphs in the intro and methods are new. However, when I compare the previous version with the current version in Draftable, I can see that there were several insertions and deletions in both. Please make sure to have clear and correct track changes files, which will help the review process go faster.
I look forward to receiving your revised submission.
All my best,
Corina
Other comments:
Line 181 - how did you empirically determine the optimal sample size? What is the minimum sample size?
Figure 2 and legend needs panel numbers
Line 253 and throughout the manuscript - add year to the in text citation
Line 403 - add a citation for distinguishing between thresholds
Interrater reliability - I am not sure what the standard threshold is in the fields of linguistics and music, however, for comparative cognition, which is my field, ICCs must be 0.90 or higher. I found an article on “A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research”. They suggest: “As a rule of thumb, researchers should try to obtain at least 30 heterogeneous samples and involve at least 3 raters whenever possible when conducting a reliability study. Under such conditions, we suggest that ICC values less than 0.5 are indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability.2” (Koo and Li 2016)
Therefore, I recommend raising your passing threshold from 0.60 to at least 0.75, but 0.80 would be better. This is a crucial piece because it is an indicator of the quality of the data you will be analyzing, and what goes into the models needs to be as clean as possible to obtain more rigorous results from the model.
Reference: Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016 Jun;15(2):155-63. doi: 10.1016/j.jcm.2016.02.012.
Regarding your responses in the Author response document…
- Table 1 only had 4 minor edits, so it was not a “substantial expansion”. Did you intend to make substantial changes to this table?
- The video tutorial is great! I think a written protocol would also be handy for people who don’t want to scroll through the video if they have specific questions they want a refresher on.
I think this is an interesting and well-designed study. The authors have satisfactorily address my suggestions in the previous round.
DOI or URL of the report: https://doi.org/10.31234/osf.io/c2dba_v3
Version of the report: 1
Dear Dr Savage and co-authors,
Thank you for your submission to PCI RR. I appreciate that you are using the Programmatic Registered Report innovation to its fullest potential and I love that you are using it to improve the equitable sharing of co-authorship. I’m glad to be involved in this process!
I have received feedback from three reviewers and, combined with my own feedback, I welcome a revised version of your Stage 1. See below for the reviewer feedback. My main comment is that the abstract, introduction, and methods need more detail to be replicable by team members and others (see review criterion 1D at https://rr.peercommunityin.org/help/guide_for_recommenders#h_6759646236401613643390905).
Additionally, I thank you for having a quick, pre-review back and forth with me, which resulted in a partial revision based on some of the changes I suggested. I include your author response to my comments in the PDF below, which shows my detailed comments on your submission and how you have already addressed some of them. I note that the question about the level of bias for this submission (currently proposed as a level 6) is still unanswered. From Savage et al (2025), it looks like your planned data collection start date was 1 Dec 2024, which means that some of the data (the recordings) for the current submission is already being collected. That means that this submission would be a level 4 or below. Please explain what stage the data collection is at and how visible the data are to the authors.
I look forward to receiving your revision.
All my best,
Corina
Download recommender's annotationsThe study is a well motivated extension of Ozaki et al. (2024). It extends the Ozaki et al. study by adding more samples. The writing is clear and the analyses are straightforward. I had the same concern I raised for Ozaki et al. (2024) - That is "inter-onset interval" is a highly ambiguous word. When taking about inter-onset interval for speech, one may think about inter-syllable-onset interval, inter-word-onset interval, inter-phoneme-onset interval, etc. The same applies for music. The term is not even defined in the draft, but even if it's defined the readers should be reminded, e.g., in the abstract and conclusion, about what kind of intervals are being considered here.
I find this is overall a well-designed study with a clear rationale. Methodologically, overall the study is clear and the statistical analyses planned are --in general-- adequate (some caveats below). The methodological aspects that I think require further attention are the following:
1.- Choice of "acoustic units". The authors propose limiting the analyses to a fixed number of acoustic units. If I understood correctly these are syllables. This might be problematic. Whereas the syllable is indeed a crucial unit in many languages (referred to as syllable-timed languages in linguistics, examples in this study's set would be Spanish or Mandarin, among others), it is not so for all languages. Other languages only use the spacing between *stressed* syllables as main units (referred to as "stress-timed languages" in linguistics, and in their set these would include English or Farsi), and finally, other languages do not use syllables at all as their timing units, relying instead on "moras" (referred to as "mora-based" languages, such as Japanese). In view of this typological difference I would suggest than rather than syllables (which are not acoustic units in some of their languages) I would fix either a time duration, or use some other units such as phones or words. This is particularly relevant for any measures of speed.
2.- Choice of songs. According to the design, it seems like the choice of songs that the subjects will produce will be made by the experimenters. This could be subject to a confirmation bias. The chosen songs could --inadvertedly-- be chosen to have higher pitches. May be allowing subjects to choose their own songs, or choosing the songs according to some explicit criterion could help on this.
3.- On the analysis side of things, not much discussion is given on how they would deal with different languages providing different results. It seems like they consider a binary result, but the truth may well be that some languages show that distinction, some may show the opposite, and some might be unclear. How will such cases be dealt with. Will they explore the socio-cultural and linguistic typological differences that may lead to those?
Dear Editor and Authors,
This is a well-designed and thoughtfully structured study that builds on previous research. The project has clear research questions, a solid methodological foundation, and well-motivated hypotheses. Its collaborative nature adds further value. I have a few questions, which are intended to enhance clarity and strengthen the study’s design rather than to criticize it.
Structure
The proposed structure is well-coordinated. I understand that (1) since each site tests the hypotheses independently and reports its own findings, this avoids issues of overcrowded reporting, and (2) the meta-study synthesizes broader trends, adding value without redundancy. However, because this structure is new to me, I have a few clarification questions.
While the Stage 1 report ensures 'in principle acceptance' for the three pre-registered hypotheses at each site, it does not extend to any additional hypotheses that sites may introduce. Since these additional hypotheses do not alter the core findings related to the pre-registered questions, I do not see this as a major concern. However, a brief conceptual clarification on how the scientific rigor of these additional hypotheses is maintained—whether through a standardized process or left to individual sites—would be appreciated. This would also help clarify how these hypotheses will be evaluated during the review process. If their assessment is entirely left to the reviewers of each separate paper, it would be useful to make this explicit so that expectations are clear.
The paper states that there will be "up to" 27 individual reports, which suggests that some sites may not publish their findings independently. If some sites do not publish, will their data still be included in the meta-analysis? If so, how will their data be handled to maintain quality control?
Given the scale of coordination, it would also be helpful to consider potential unexpected scenarios and corresponding action plans. For example, in Savage et al. (2025), if a certain percentage of sites failed to deliver data, a contingency plan was in place to supplement missing data. Would a similar stepwise plan be considered here to ensure the meta-analysis remains robust even if some sites do not complete their reports or data collection on time?
The original data collection plan in Savage (2025) is set to be implemented at 57 (or 60? There seems to be a discrepancy between the two papers—please check) sites, while this study will use data from 26 of those sites. To enhance transparency, could you clarify whether the selection of these 26 sites was determined before data collection began? If the selection took place after data collection but before transcription, could this introduce potential selection bias?
Methods (Discussion point)
If participants first engage in a karaoke-style singing task, could this prime them to a specific tempo, influencing their subsequent monophonic singing? The study examines tempo differences between speech and song. In this context, is it important to consider whether the methodology captures the natural singing tempo? The karaoke accompaniment might shape the singing tempo rather than reflect a spontaneous pace. While I understand that data collection may already be in progress and the protocol cannot be changed as it is pre-registered, it would be valuable to discuss this as a potential impact.
Sincerely, Makiko Sadakata
Savage, P. E. et al. Does synchronised singing enhance social bonding more than speaking does? A global experimental Stage 1 Registered Report [In Principle Accepted]. Peer Community Regist. Rep. (2025) doi:10.31234/osf.io/pv3m9.