Do songs differ from speech similarly across cultures?

Corina Logan

Do songs differ from speech similarly across cultures?

Corina Logan based on reviews by Nai Ding, Fermin Moscoso del Prado Martin and Makiko Sadakata

A recommendation of:

STAGE 1

A Programmatic Stage 1 Registered Report of global song-speech relationships replicating and extending Ozaki et al. (2024) and Savage et al. (2024)

Patrick Savage, Zixuan Jia, Yuto Ozaki, Danya Pavlovich, Suzanne Purdy, Adwoa Ampiah-Bonney, Aleksandar Arabadjiev, Flavia Arnese, Joshua Bamford, Brenda Barbosa, Ann-Kathrin Beck, Anne Cabildo, Gakuto Chiba, Shahaboddin Dabaghi Varnosfaderani, Simion Echim, Shinya Fujii, Shira Gabriel, Massimo Grassi, Lucrezia Guiotto Nai Fovino, Jan Hajič jr, Kai Yi Han, Niels Chr. Hansen, Martin Hartmann, Ye He, Sotirios Kolios, Wojtek Krzyżanowski, Urise Kuikuro, Dilyana Kurdova, Fang Liu, Psyche Loui, Zoya Mikova, Dayna Moya, Rogerdison Natsitsabui, Minja Niiranen, Nozuko Nguqu, Petr Nuska, Florence Ewomazino Nweke, Patricia Opondo, Hineatua Parkinson, Mark Lenini Parselelo, Gemma Perry, Peter Q. Pfordresher, Piotr Podlipniak, Tudor Popescu, Robert M. Ross, Zhuozhuang Shi, Javier Silva-Zurita, Ignacio Soto-Silva, Barbora Štěpánková, William Forde Thompson, Sebastian Vaida, Christina Vanden Bosch der Nederlanden https://doi.org/10.31234/osf.io/c2dba_v5 version 5

Read report on server

Abstract

ZH-CN

Submission: posted 29 November 2024
Recommendation: posted 05 May 2025, validated 08 May 2025

Cite this recommendation as:
Logan, C. (2025) Do songs differ from speech similarly across cultures?. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=961

Recommendation

Most investigations of the relationship between music and speech – two features considered universal to human societies (Brown 1991) – focus on only a few languages in an English and/or European-centric context (Savage et al. 2015). A recent study investigated this relationship in many more languages in a global context and found that songs are different from speech in a few variables (e.g., they are slower), using a relatively small sample per language (Ozaki et al. 2024). The current study (Savage et al. 2025) is conducting a large-scale replication to increase the sample size and understand whether the results reported by Ozaki et al. (2024) persist.

In this registered report, Savage and colleagues (2025) will measure pitch height, temporal rate, and pitch stability in 15-30 individuals at 26 sites (languages) around the world. This much larger sample size per language will allow them to determine whether certain languages have different relationships between song and speech, or whether they vary together in the same way across cultures globally. These results will provide robust evidence from which debates about the function of music can move forward.

Savage and colleagues (2025) have innovated a more equitable way of conducting big team research using PCI RR’s Programmatic Registered Report mechanism, where one Stage 1 can result in multiple Stage 2 manuscripts. All co-authors are on the Stage 1 manuscript, whereas each site will produce its own Stage 2, thus allowing many authors to move into the first and last author positions and receive the deserved credit for their contributions.

The Stage 1 manuscript was evaluated over three rounds of in-depth review. Based on detailed responses to the recommender and reviewers' comments, the recommender judged that the manuscript met the Stage 1 criteria and therefore awarded in-principle acceptance (IPA).

URL to the preregistered Stage 1 protocol: https://osf.io/vs9my

Level of bias control achieved: Level 6. No part of the data or evidence that will be used to answer the research question yet exists and no part will be generated until after IPA.

List of eligible PCI RR-friendly journals:

Advances in Cognitive Psychology
Advances in Methods and Practices in Psychological Science *pending editorial consideration of disciplinary fit
Biolinguistics
Collabra: Psychology
Experimental Psychology *pending editorial consideration of disciplinary fit
Journal of Cognition
Meta-Psychology
Peer Community Journal
PeerJ
Royal Society Open Science
Studia Psychologica
Swiss Psychology Open

References

1. Brown, D. E. Human Universals. (Temple University Press, Philadelphia, 1991).

2. Ozaki, Y., Tierney, A., Pfordresher, P. Q., McBride, J. M., Benetos, E., Proutskova, P., ... & Savage, P. E. (2024). Globally, songs and instrumental melodies are slower and higher and use more stable pitches than speech: A Registered Report. Science Advances, 10, eadm9797. https://doi.org/10.1126/sciadv.adm9797

3. Savage, P. E., Brown, S., Sakai, E., & Currie, T. E. (2015). Statistical universals reveal the structures and functions of human music. Proceedings of the National Academy of Sciences, 112, 8987-8992. https://doi.org/10.1073/pnas.1414495112

4. Savage P. E., Jia Z., Ozaki Y., Pavlovich D. V., Purdy S., Ampiah-Bonney A., … & der Nederlanden, C. (2025). A Programmatic Stage 1 Registered Report of global song-speech relationships replicating and extending Ozaki et al. (2024) and Savage et al. (2025). In principle acceptance of Version 5 by Peer Community in Registered Reports. https://osf.io/vs9my

Conflict of interest:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Reviews

Evaluation round #2

DOI or URL of the report: https://doi.org/10.31234/osf.io/c2dba_v4

Version of the report: 4

Author's Reply, 01 May 2025

Download author's reply Download tracked changes file

Decision by Corina Logan, posted 22 Apr 2025, validated 22 Apr 2025

Dear Dr Savage and co-authors,

Thank you for your revised submission and for responding to feedback. The methods section is greatly improved in terms of having more information about this research, rather than having to read methods sections from two other articles to get the full picture. And the introduction is more fleshed out in terms of what is driving this research and what it means. Now that these details are included in the manuscript, it is easier to assess the proposed work.

The main issue now is that the Stage 1 reads like an outline that is addressing points in the author guidelines at PCI RR, rather than being a full article that follows the author guidelines without explicitly discussing the guidelines. In its current state, it is unlikely that authors will be able to copy and paste the intro and methods into their Stage 2s without major revisions to the text, which is not permitted at Stage 2 (see section 2.10 in the PCI RR author guidelines https://rr.peercommunityin.org/help/guide_for_authors#h_97949820420921613309536944). For example, I appreciate that you made the section “Equitable coauthorship in global collaboration”. However, now that I see it in the context of this draft, I think it should be put into supplementary material because it is a commentary on your workflow and how you will divide up the labor that goes into the Stage 2 articles - the process behind the articles. This is useful for people to know because you are developing a new way of working together, but this is not the focus of your research and I think it distracts from the research in the main text.

Other elements that make the Stage 1 look more like an outline than a final draft are along the following lines… Referring to PCI RR’s programmatic track in the abstract and intro should be removed, and instead discuss how the outputs will be conducted and how this will benefit the research. Instructions to recommenders and reviewers can be put in author responses and cover letters, or you might want to put this in supplementary material because you are developing a new way of working together that others might want to use. Table 1 should be included in the main text (maybe in the methods?) rather than as a separate page that has no section title between the abstract and introduction (also landscape orientation for this table would make it fit better on the page). The “Hypothesis” section should be incorporated into the regular text without the word “Hypothesis” denoting it as a separate section. The Hypothesis section has methods in it that should be moved to the methods section. Explicit references to things like PCI RR’s criterion 2D should be removed because the content in the intro and methods section should state what will be done (which will be in accordance with this criterion, but without explicitly pointing it out). In the methods section, don’t refer to Stage 2s or first authors of Stage 2 articles and so on, just say what the minimum sample size is for each population/site (whatever term you want to use) and how each population will be coded (e.g., line 228). There are many other elements along these lines, including in the methods section, but hopefully this feedback gives you enough of an idea to be able to change the rest of these elements.

In other words, this Stage 1 should be about the research you are proposing to conduct and not a discussion about your workflows (you can put the latter in supplementary material). Think of this Stage 1 as being the final Stage 2 article (but without the results and discussion sections) - this will be what ends up as the final abstract and intro and methods (of course, modified for each Stage 2 article). I have several further comments; please see below.

The track changes file doesn’t show individual insertions and deletions, and some changes don’t show up at all. For example, it looks like the entire abstract and whole paragraphs in the intro and methods are new. However, when I compare the previous version with the current version in Draftable, I can see that there were several insertions and deletions in both. Please make sure to have clear and correct track changes files, which will help the review process go faster.

I look forward to receiving your revised submission.

All my best,

Corina

Other comments:

Line 181 - how did you empirically determine the optimal sample size? What is the minimum sample size?

Figure 2 and legend needs panel numbers

Line 253 and throughout the manuscript - add year to the in text citation

Line 403 - add a citation for distinguishing between thresholds

Interrater reliability - I am not sure what the standard threshold is in the fields of linguistics and music, however, for comparative cognition, which is my field, ICCs must be 0.90 or higher. I found an article on “A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research”. They suggest: “As a rule of thumb, researchers should try to obtain at least 30 heterogeneous samples and involve at least 3 raters whenever possible when conducting a reliability study. Under such conditions, we suggest that ICC values less than 0.5 are indicative of poor reliability, values between 0.5 and 0.75 indicate moderate reliability, values between 0.75 and 0.9 indicate good reliability, and values greater than 0.90 indicate excellent reliability.2” (Koo and Li 2016)

Therefore, I recommend raising your passing threshold from 0.60 to at least 0.75, but 0.80 would be better. This is a crucial piece because it is an indicator of the quality of the data you will be analyzing, and what goes into the models needs to be as clean as possible to obtain more rigorous results from the model.

Reference: Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016 Jun;15(2):155-63. doi: 10.1016/j.jcm.2016.02.012.

Regarding your responses in the Author response document…

- Table 1 only had 4 minor edits, so it was not a “substantial expansion”. Did you intend to make substantial changes to this table?

- The video tutorial is great! I think a written protocol would also be handy for people who don’t want to scroll through the video if they have specific questions they want a refresher on.

Reviewed by Fermin Moscoso del Prado Martin, 09 Apr 2025

I think this is an interesting and well-designed study. The authors have satisfactorily address my suggestions in the previous round.

Evaluation round #1

DOI or URL of the report: https://doi.org/10.31234/osf.io/c2dba_v3

Version of the report: 1

Author's Reply, 02 Apr 2025

Download author's reply Download tracked changes file

Decision by Corina Logan, posted 04 Mar 2025, validated 05 Mar 2025

Dear Dr Savage and co-authors,

Thank you for your submission to PCI RR. I appreciate that you are using the Programmatic Registered Report innovation to its fullest potential and I love that you are using it to improve the equitable sharing of co-authorship. I’m glad to be involved in this process!

I have received feedback from three reviewers and, combined with my own feedback, I welcome a revised version of your Stage 1. See below for the reviewer feedback. My main comment is that the abstract, introduction, and methods need more detail to be replicable by team members and others (see review criterion 1D at https://rr.peercommunityin.org/help/guide_for_recommenders#h_6759646236401613643390905).

Additionally, I thank you for having a quick, pre-review back and forth with me, which resulted in a partial revision based on some of the changes I suggested. I include your author response to my comments in the PDF below, which shows my detailed comments on your submission and how you have already addressed some of them. I note that the question about the level of bias for this submission (currently proposed as a level 6) is still unanswered. From Savage et al (2025), it looks like your planned data collection start date was 1 Dec 2024, which means that some of the data (the recordings) for the current submission is already being collected. That means that this submission would be a level 4 or below. Please explain what stage the data collection is at and how visible the data are to the authors.

I look forward to receiving your revision.

All my best,

Corina

Download recommender's annotations

Reviewed by Nai Ding, 27 Feb 2025

The study is a well motivated extension of Ozaki et al. (2024). It extends the Ozaki et al. study by adding more samples. The writing is clear and the analyses are straightforward. I had the same concern I raised for Ozaki et al. (2024) - That is "inter-onset interval" is a highly ambiguous word. When taking about inter-onset interval for speech, one may think about inter-syllable-onset interval, inter-word-onset interval, inter-phoneme-onset interval, etc. The same applies for music. The term is not even defined in the draft, but even if it's defined the readers should be reminded, e.g., in the abstract and conclusion, about what kind of intervals are being considered here.

Reviewed by Fermin Moscoso del Prado Martin, 04 Mar 2025

I find this is overall a well-designed study with a clear rationale. Methodologically, overall the study is clear and the statistical analyses planned are --in general-- adequate (some caveats below). The methodological aspects that I think require further attention are the following:

1.- Choice of "acoustic units". The authors propose limiting the analyses to a fixed number of acoustic units. If I understood correctly these are syllables. This might be problematic. Whereas the syllable is indeed a crucial unit in many languages (referred to as syllable-timed languages in linguistics, examples in this study's set would be Spanish or Mandarin, among others), it is not so for all languages. Other languages only use the spacing between *stressed* syllables as main units (referred to as "stress-timed languages" in linguistics, and in their set these would include English or Farsi), and finally, other languages do not use syllables at all as their timing units, relying instead on "moras" (referred to as "mora-based" languages, such as Japanese). In view of this typological difference I would suggest than rather than syllables (which are not acoustic units in some of their languages) I would fix either a time duration, or use some other units such as phones or words. This is particularly relevant for any measures of speed.

2.- Choice of songs. According to the design, it seems like the choice of songs that the subjects will produce will be made by the experimenters. This could be subject to a confirmation bias. The chosen songs could --inadvertedly-- be chosen to have higher pitches. May be allowing subjects to choose their own songs, or choosing the songs according to some explicit criterion could help on this.

3.- On the analysis side of things, not much discussion is given on how they would deal with different languages providing different results. It seems like they consider a binary result, but the truth may well be that some languages show that distinction, some may show the opposite, and some might be unclear. How will such cases be dealt with. Will they explore the socio-cultural and linguistic typological differences that may lead to those?

Reviewed by Makiko Sadakata, 28 Feb 2025

Dear Editor and Authors,

This is a well-designed and thoughtfully structured study that builds on previous research. The project has clear research questions, a solid methodological foundation, and well-motivated hypotheses. Its collaborative nature adds further value. I have a few questions, which are intended to enhance clarity and strengthen the study’s design rather than to criticize it.

Structure
The proposed structure is well-coordinated. I understand that (1) since each site tests the hypotheses independently and reports its own findings, this avoids issues of overcrowded reporting, and (2) the meta-study synthesizes broader trends, adding value without redundancy. However, because this structure is new to me, I have a few clarification questions.

While the Stage 1 report ensures 'in principle acceptance' for the three pre-registered hypotheses at each site, it does not extend to any additional hypotheses that sites may introduce. Since these additional hypotheses do not alter the core findings related to the pre-registered questions, I do not see this as a major concern. However, a brief conceptual clarification on how the scientific rigor of these additional hypotheses is maintained—whether through a standardized process or left to individual sites—would be appreciated. This would also help clarify how these hypotheses will be evaluated during the review process. If their assessment is entirely left to the reviewers of each separate paper, it would be useful to make this explicit so that expectations are clear.

The paper states that there will be "up to" 27 individual reports, which suggests that some sites may not publish their findings independently. If some sites do not publish, will their data still be included in the meta-analysis? If so, how will their data be handled to maintain quality control?

Given the scale of coordination, it would also be helpful to consider potential unexpected scenarios and corresponding action plans. For example, in Savage et al. (2025), if a certain percentage of sites failed to deliver data, a contingency plan was in place to supplement missing data. Would a similar stepwise plan be considered here to ensure the meta-analysis remains robust even if some sites do not complete their reports or data collection on time?

The original data collection plan in Savage (2025) is set to be implemented at 57 (or 60? There seems to be a discrepancy between the two papers—please check) sites, while this study will use data from 26 of those sites. To enhance transparency, could you clarify whether the selection of these 26 sites was determined before data collection began? If the selection took place after data collection but before transcription, could this introduce potential selection bias?

Methods (Discussion point)
If participants first engage in a karaoke-style singing task, could this prime them to a specific tempo, influencing their subsequent monophonic singing? The study examines tempo differences between speech and song. In this context, is it important to consider whether the methodology captures the natural singing tempo? The karaoke accompaniment might shape the singing tempo rather than reflect a spontaneous pace. While I understand that data collection may already be in progress and the protocol cannot be changed as it is pre-registered, it would be valuable to discuss this as a potential impact.

Sincerely, Makiko Sadakata

Savage, P. E. et al. Does synchronised singing enhance social bonding more than speaking does? A global experimental Stage 1 Registered Report [In Principle Accepted]. Peer Community Regist. Rep. (2025) doi:10.31234/osf.io/pv3m9.