Do humans bond more when singing together or speaking together? A global investigation
Does synchronised singing enhance social bonding more than speaking does? A global experimental Stage 1 Registered Report
Abstract
Recommendation: posted 25 January 2025, validated 29 January 2025
Moore, K. (2025) Do humans bond more when singing together or speaking together? A global investigation. Peer Community in Registered Reports, . https://rr.peercommunityin.org/articles/rec?id=890
Recommendation
List of eligible PCI RR-friendly journals:
- Advances in Cognitive Psychology
- Biolinguistics
- Collabra: Psychology
- Experimental Psychology *pending editorial consideration of disciplinary fit
- International Review of Social Psychology
- Journal of Cognition
- Peer Community Journal
- PeerJ
- Royal Society Open Science
- Social Psychological Bulletin
- Studia Psychologica
- Swiss Psychology Open
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.
Reviewed by Erin Hannon, 12 Jan 2025
The authors have done an excellent job responding to reviewer comments, and I think the paper/proposal is now even better. My only lingering concern is about the public goods game-- in this situation they are simply being asked to imagine how much they "would" give and so for me (again) it seems like a real possiiblty that people might do something different if a resource was actually on the line. However I agree that no one approach to measuring cooperation is better than others, and I the reasons for not doing this make sense. Perhaps it is just worth including a brief discussion at some point in the paper about the possibility that individuals would behave differently if actual money was at stake, particularly if relevant for interpreting findings. Otherwise I think this is an exciting project and I look forward to seeing the results!
Reviewed by Manuela Maria Marin, 17 Jan 2025
I am pleased to see that the authors have incorporated most of my suggestions, and I am generally satisfied with the revised protocol and the authors' replies.
Introduction/title: the authors should make clear in their Introduction, and perhaps in the title, that they are testing the effect of singing on "social bonding in groups" and not on dyads (e.g., mother-infant bonding), so that readers are not confused and draw wrong inferences from the results.
L. 314, moderating variables and familiarity effects: I suggest that the data of groups in which participants know more than two other participants are excluded. From my experience, I predict a strong familiarity effect (either way, whether they know and like each other or if they know and dislike each other). Per definition, three participants already form a social group, and if the group comprises only 5 members, then three participants would already represent a majority... In any case, please keep a close eye on possible familiarity effects.
I wish the authors good luck with their study!
Manuela M. Marin
Evaluation round #1
DOI or URL of the report: https://doi.org/10.31234/osf.io/pv3m9
Version of the report: 6
Author's Reply, 29 Nov 2024
Decision by Katherine Moore, posted 27 Oct 2024, validated 27 Oct 2024
Thank you for submitting your pre-registration report titled “Does synchronised singing enhance cooperation more than speaking does? A global experimental Stage 1 Registered Report” Four expert reviewers have read your proposal and provided feedback. Collectively, the reviewers praised the study. They believe the work is on an important topic and is likely to have a very positive impact on the literature. They are especially impressed with the scale of the collaboration and cross-cultural investigation. Of course, for that reason, all reviewers also believe it’s important to have the best possible design to maximize the fruits of this labor. They have each offered helpful suggestions on how to improve the study.
The comments include suggestions and concerns about the literature review and theoretical context as well as with the design of the study. Some of the reviewers have provided similar concerns. For example, Drs. Hannon and Marin both suggested videotaping the session to evaluate participants’ compliance with the task instructions, or perhaps going a step further to ensure compliance in the moment. Several reviewers brought up concerns about equivalency between the speech and song tasks and across sites. They suggested some possibilities to address this, such as requiring participants to repeat short songs up to a particular duration, or to choose poems instead of songs for the speech task. One important issue for the authors to consider is weighing external (or ecological) vs. internal validity. Drs. Hannon and Goritz both suggest using another measurement of cooperation other than self-report, and provide some suggestions. Drs. Brandon and Marin both make a suggestion for improving terminology. These are some examples, but most reviewer suggestions are unique. Please address all of the reviewer comments in a revision of your report. It is not necessary to take every piece of advice from the reviewers, but you should address their concerns (and why you disagreed, if applicable) in a response letter.
Thank you for your submission and I look forward to seeing your revision!
Reviewed by Melissa Brandon, 16 Oct 2024
Dear Editor and Authors,
Overall, this is a very exciting study proposal that has a well thought out plan to collect data across multiple sites around the world. The research question is clear and will provide useful knowledge to the field about music’s impact on social cooperation with additional controls missing from past studies. There is a logical plan for the primary data analysis. I do have a few questions and suggestions for clarity listed below by line numbers. I look forward to seeing the study results in the future.
Sincerely,
Melissa Brandon, Ph.D.
Questions or comments:
Lines 78 & 79: Sentence is worded awkwardly. Can you use the word conversation over language or does that deter from the point of comparing language and music?
Line 191: Can you say more about the measure of cooperation? Is this a normed survey and if so in how many cultures or languages? Or how is the survey being adapted across languages and cultures? Interested to know how comparable this measure is across different cultures and languages. Add this either here or around line 243 or 329. But need more information on the measure and its adaptability across sites.
Line 222: The post-interaction assessment of corporation needs to be stated here in the method plan and any info on your other post-test questionnaires. Do they all happen after the interactions? Given that the cooperation measure post interaction is your main source of data it should be stated in the procedure plan.
Line 258: Further clarify “able to sing song with lyrics.” Will this self-select to only people willing to publicly sing? That could skew the sample to people with music experience. On the other side are you trying to ensure no one with amusia is in the study? I am asking are you looking for a sample with all levels of music experience or trying to exclude anyone?
Line 277: Good clarifications of standards of when and how data is kept in case of small groups and or technical errors. Glad this was informed from pilot data.
Line 553: For the “Additional data types”, do you have the number of sites collecting the extra measures or the goal # of participants in those additional measures? Good to add if you already know the information.
Lines 572-578: Are all sites doing the post-experiment conditions or only some? Will the same analysis approach be used for the 3rd cooperative measure as described above or a different analysis since this is in the exploratory section, please clarify?
In looking at Table S1 there are differences in the values of compensation participants will be paid. I noticed because values of the different locations in the UK are very different amounts. I know this is determined by site, but it may be another variable to examine. If the values are drastically different it could affect motivation to participate and potentially the feelings of cooperations. If there is a similar difference in compensation across sites in your pilot data, it may be worth examining if it is an explanatory variable. This could help you plan for how to control for this difference in your study (line364).
Reviewed by Erin Hannon, 15 Oct 2024
The proposed study asks a valid and novel question about the effects of synchronous musical activity (singing) on cooperation, comparing this with sequential and synchronous (chanting) speech conditions relative to a pre-intervention baseline. The study will have a large, multi-site, multicultural sample and thus address questions about music and cooperation in a much more diverse sample than has previously been tested. There are excellent controls and procedures in place (experimenters are blind to condition, stimulus decisions are left to local experts), and crucial additional variables are being tracked (the extent to which participants knew each other beforehand, music training, etc). At this stage, I have two minor concerns/questions and one more substantive concern about the proposed study.
1. As I understand it, the experiment will be run in a room without an experimenter and instructions will be given over video. This is to ensure experimenter blindness, and it seems fine, except as far as I can tell there is no procedure in place to measure the extent to which participants followed instructions and did the task as instructed. This is especially important because they are not supposed to interact prior to the singing/speaking, but it also seems important if some groups do a better job than others singing in synchrony etc.. Most of the between-group variation appears to be accounted for through self-report. It seems like it would be a relatively simple step to video-record sessions and later check that participants followed instructions, at least for those sites that can do so. If such procedures are already in place, they should be described in the document, along with details about how compliance would be evaluated.
2. This is perhaps just a matter of clarification, but I did not understand the following: "Although the (synchonised) singing condition and the (simultaneous) conversation conditions differ in the presence of both singing and synchrony, we cannot measure the effect of these factors separately through the comparison of only these two conditions. Therefore, we model the combination of these effects as a single effect, which we name synchronous singing." Does this just meant that one variable will dichotomously code for speaking vs. singing, and another variable will code for synchornous singing vs. not synchronous singing? If so this could be made clearer.
3. My biggest concern is regarding the construct validity of the dependent measure(s). The research proposes to measure cooperation with only 4 self-report responses. I realize that other studies examining effects of synchronous activity on cooperation have used similar measures, however given the magnitude and potential impact of this project, it seems like a missed opportunity to not also have more direct measures of cooperation. If participants figure out what the researchers are measuring (which is likely) this could give rise to demand effects-- yes, this would affect all three conditions, however the pattern of results could reflect more about what they believe about the effects of various activities on cooperation rather than how they actually feel towards other participants after doing the activities. Why not also ask them to make a decision about sharing a resource with the group? There are many simple "games" in the behavioral economics literature that could be easily implemented in Qualtrics. The down side would be ensuring all sites are able to offer some sort of resource of incentive. If the authors disagree, then perhaps they could add some discussion acknowledging the limitations of only using a self-report measure of the outcome variable.
Reviewed by Manuela Maria Marin, 01 Oct 2024
I commend the authors‘ cross-cultural initiative to study the effects of singing and speech on social bonding. Their efforts may lead to a valuable contribution to the existing literature. I hope these comments will be helpful as the authors continue to work on their project.
Introduction:
a) l. 59-66: This section misleadingly reads as if Darwin had himself no theory about the origins of music, but in fact, he actually proposed a sexual selection hypothesis for music, which has also gained empirical support. Please add this information after l. 62, two recent reviews of the literature can be found below:
Bamford, J. S., Vigl, J., Hämäläinen, M., & Saarikallio, S. H. (2024). Love songs and serenades: a theoretical review of music and romantic relationships. Frontiers in Psychology, 15, 1302548.
Marin, M. M., & Gingras, B. (2024). How music-induced emotions affect sexual attraction: evolutionary implications. Frontiers in Psychology, 15, 1269820.
b) l. 56-108. Use of terminology: Please define and introduce the terms social bonding, social cohesion and prosocial attitudes and behavior. I would also recommend using and focusing on the concept that fits best the dependent variable of interest (prosocial attitudes or social cohesion?). I suggest using the term speech (or speaking) rather than language because speech refers to the auditory component of language and is thus more appropriate in the current research context. I know that the authors are aware of it, but testing the effects of singing (with words) and speech on prosocial attitudes is a very specific comparison that cannot easily be generalized to other forms of human music, such as joint music making involving instruments, dancing, listening to music and so on.
c) l. 63-81: It would be useful for readers and reviewers to get an estimate of the evolutionary time-scale the authors have in mind for their social bonding hypothesis and the evolution of music and speech in humans more generally. Do the authors think that music and speech may have co-evolved among humans with a common precursor? I think it is important to get further insights into these issues because one may ask why the authors chose speech as a control condition and not some other group activities in which interpersonal synchrony also plays a role, such as group sports, working/walking together, or activities in creative arts unrelated to language. To sum up, the choice of the speech conditions should be better theoretically justified.
d) l. 72-81: Our closest evolutionary relatives are group-living animals, and unlike humans, they do not exhibit complex speech and do not show any signs of music. This is actually a valid argument against the idea that social bonding and cooperation are the (only) driving forces behind the evolution of music. Effective survival in groups neither requires a complex language nor music. Please comment.
e) l. 133, hypotheses: I think that the authors could formulate more specific hypotheses, based on references 43 and 44, and due to the fact that they are planning to have two speech conditions (recitation vs. conversation) and a baseline condition. For example, one could write “Singing enhances prosocial attitudes more than recitation and conversation do in comparison to a baseline condition” or if the authors want to be even more specific and think that conversation should lead to the lowest levels in prosocial attitude, the hypotheses could also reflect this. If the authors make changes, please also do so in Table 1. I think that the baseline condition should be mentioned in the hypotheses. If not, please explain why.
f) Are there other studies than references 43 and 44 that have tested the effect of speech on social bonding? If so, they should be cited.
Methods
a) l. 177, group testing and group size: I think it is problematic that the planned group size varies so much (although Rennnung & Göritz (2016) did not report a group size effect), especially for the conversation condition it may make a difference whether conversation takes place between 5 or 10 people in a group. If time is limited, a large group may hinder conversation and prevent every participant from contributing, which makes it a flawed comparison. Note that the authors are aiming for group conversation to happen, and not for individual conversations between pairs. Why not make it 5 people per group across all sites? One can run more than one session per site.
b) l. 201-222. I am also wondering whether the sessions will be filmed and whether one can assess whether singing/reciting/conversation actually took place and whether all participants were involved in the task. I understand that there is no experimenter in the room.
c) l. 190, pre-interaction phase. The instructions say that participants should not interact with each other, but how can one avoid looking/smiling at each other? Perhaps some more specific instructions of what not interacting means may be useful to participants.
d) l. 202, pre-selection and length of the songs and other conditions: I think that the authors should be stricter on the length of the songs and that the variability is too high. One can assume that forming a bond takes some time and that it makes a difference whether a group sings for 1 vs. 5 minutes (this represents a fivefold increase in duration!). By the way, there is no mentioning of any duration for the conversation condition. Why not make it 5 min for all three conditions? If a song is shorter, it should be repeated until the time is up.
e) l. 209, conversation topic: I am not sure whether song lyrics are an engaging conversation topic, especially if the authors have such a large variety of songs with different themes on their selected list. Certain songs may generate conversation topics much more easily than others (one needs to interpret the meaning of the lyrics before one can make a statement, which can be quite difficult, and I suspect most people will not know much about the background of songs). Some song lyrics may be somewhat confrontational (anthems?), and to be honest, what can one really say about “Happy Birthday”? ;)
Instead, the authors could choose a relatively neutral conversation topic that can be used across cultures, for example, something related to future travel destinations, how to spend an ideal weekend, their opinion about XY, a newspaper headline etc. I think that the topic of the conversation will lead to emotional involvement and that these feelings may affect the outcome. This is actually also true for the song condition, which is why I suggest stricter selection criteria. One could also offer a picture as a stimulus/trigger for a conversation that is the same in all countries. I would not choose anything that is political or related to religion, but something that can easily lead to an engaging conversation. I understand that the respective song lyrics will be part of the statistical model, but it is important that everybody in a group gets quickly involved in an engaging conversation, otherwise the comparison with singing is not justified. Please also mention in the instructions that people should not talk in pairs, which may automatically happen, especially if the authors are planning larger groups.
f) l. 214, Recitation: Please be aware of the fact that some song lyrics may contain meter and rhyme whereas other will not, and some will contain lots of repetition whereas others will not. It may thus be important to check the lyrics for these characteristics and to either control for it or to take it into account in the statistical analysis (not in a pooled random effect). One can surmise that meter/rhyme based lyrics will have larger effects on bonding than free verse. I understand that the authors thought that it may be practical and good to use one song, its lyrics and its topic as a conversation topic across conditions, but at the same time, this may introduce other problems. Why not choose a real poem for the recitation condition? One could agree on a poem with several verses that is typical for a given culture (part of an established canon). One can say that reading a poem in a group is an artificial task, but reading song lyrics (with lots of repetition?) is also an artificial task because song lyrics are usually sung and participants know the song. The use of poems would at least make the results comparable to studies which also used poems and be ecologically more valid.
g) l. 263, exclusion criteria: I think that the authors need some sort of evidence that the participants really participated in the activities of the three experimental conditions. Even if singing/speaking will be recorded (by how many microphones?), we do not know whether each individual in a group really performed the task. They may not be honest in the questionnaire. I therefore suggest filming the sessions in addition to recording the voices.
h) I do not see any explicit mentioning of the debriefing. What information will the participants be given after the experiment?
i) List of songs: I am concerned about the large variability among the chosen songs. The selection criteria could be more stringent in my view. The type of chosen songs varies much across the 50 proposed recruitment sites and languages in terms of length, genre, thematic contents, emotional tone, semantic contents, and appropriate age group. The list of songs in the Appendix contains children’s songs, Christmas songs, lullabies, folk songs, anthems, pop songs, songs one cannot identify due to the language, etc. I am wondering how singing a children’s song or Happy Birthday feels in a group of (young) adults for several minutes. It could feel a bit ridiculous and inappropriate for a given social situation. There is also a potentially large difference in terms of bonding when singing one’s national anthem versus a random pop song. More specifically, anthems may evoke strong feelings in some participants, and if the lyrics are part of a conversation task, they may lead to controversial political discussions in which social cohesion is either significantly heightened (in comparison to a typical pop song) or not experienced at all.
This large variability may also affect the engagement in the other tasks because at the moment all tasks are centered around one specific song (conversation and recitation task). Perhaps the authors aimed at using a wide range of songs to increase the generalizability of their results, but the chosen songs types should not vary so much across sites, especially since there is only one song per site. As expressed above, the length of the songs is also critical and should not show such a large variation in duration. If a song is very short, one could repeat it or add several verses (although singing 5 minutes of e.g. Happy Birthday is also problematic for other reasons and makes the task artificial and may not lead to the desired effects). I think the authors should reflect on these issues and propose more strict selection criteria.
Analysis plan
a) l. 325, independent variable: Are the authors aggregating across the recitation and conversation conditions? Why? The experiment is clearly designed to have three conditions (singing, recitation and conversation). It is unclear why these conditions are merged.
b) ll. 327-356, dependent variable, prosocial attitude: The number of items representing the DV is rather low in comparison to the number of all other collected background and moderator variables. Is there no short scale on prosocial attitude available that has good psychometric characteristics? The study would profit from using an established measure, rather than a hodgepodge of items.
c) l. 364, the inclusion of a random effect is good, but as I explained above, the study would benefit from controlling for the length of the condition, the differences in group size, and the type and duration of the selected songs. These points refer to the quality of the manipulation of the experimental conditions and a valid comparison between them. I am not sure whether a pooled random effect is the best way to go ahead. I know that too many random factors can result in a model not converging, but the effect of some factors (particularly those mentioned by Rennung & Göritz, 2016) may be informative. How is “interpersonal dynamics of a given group of participants” assessed?
Pilot data
a) In my opinion, pilot data is only relevant for the exact protocol presented in this registered report. If the experimental setup was very different, as ll. 468-478 suggest, the relevance for this study is limited. The experiment mentioned under 4) seems to use the exact methods as described here and is informative. Again, here (Fig. 4) the authors present three conditions, and I think the statistical analysis should reflect this as well.
Further data to be collected
l. 510: Moderating variables: I like the list. In case the authors change the instructions of the conditions, 7) and 8) need to be changed. One could also ask how natural the task felt to them, in particular if the authors decide to stick with their choice of reciting song lyrics and talking about song lyrics and songs. It may be informative to ask whether one person acted as a group leader. I am quite sure that if 5 people are asked to sing together, that one person will lead the others, same for the other conditions. It may be informative because synchrony will probably work better if one capable person leads the others during the performance of the task.
l. 547, would it not be interesting to ask how much they enjoy singing in general and how often they sing in everyday life? If the authors decide to work with poems, one could ask how much they like reading and literature.
l. 553: Additional data types: I understand that studies comprising many sites are difficult to organize and to lead, and that certain groups may want to collect further data and have different interests, but it is important that the protocol described here is the same across sites and strictly followed, and that participants are not wired at one site and not at the other etc. We should avoid even more variability, particularly if this information is not being present as a factor in any analysis (except in a large pooled random effect…). The current task is short, and therefore I do not see a reason for adding more variability to the set up. It can be easily accomplished in any psychology laboratory on this planet for its own sake.
l. 566, synchrony: See my previous comment about filming.
l. 573, post-experiment conditions. I am confused: “participants will be asked to do the other experimental conditions”? I thought that the experiment is a between-subjects design (see l. 171). I am surprised about the idea to make them do all other conditions after the real experiment to see whether cooperation continues to increase after doing multiple conditions in order. Why not run a within-subjects design in the first place?
Manuela M. Marin
Reviewed by Anja Göritz, 26 Sep 2024
The material to be sung or conversed over or recited is to be a song. This introduces a default bias in favor of singing over speaking. Song could be superior to speech in calling forth prosocial attitudes partly or entirely because of this confound (i.e., being the default) and not because it is causally superior. A clearer test of the hypothesis is possible by using material that either is new or that in any particular site of experimentation is spoken as well as sung at about equal frequency, for example, a (sung) poem that is equally frequently spoken and sung; the first raising procedural issues of training/familiarizing participants with the new material prior to the experiment, the latter raises issues of finding good material.
There was a lack in conceptual clarity in some spots. For example, Table 1: "Contradicts null hypotheses that music is biologically “useless…[c]ompared with language…”3 or “does not directly cause social cohesion”4:" should be expressed more precisely.
Lines 78-81: The quote does not pertain to the statement in 78 and 79. What means "directly causing", and is signaling not a way of causation?
Lines 116 & 128: The cited references could not be accessed without installing a plug-in, which I did not do. This barrier to peer review should be removed.
Having merely self-reported prosocial attitude as a dependent measure is methodologically weak. A behavioral measure of cooperation/cohesion should be added, for an example, see https://doi.org/10.1371/journal.pone.0136750. Although for some sites of experimentation, something along these lines is planned, it would be better to do it in all sites, also working toward across-site standardization. Pretesting and calibration of the implementation of this bevarioral dependent measure should ensure that ceiling/bottom effects are unlikely.
Line 325: Given that the independent variable is dichotomous (singing vs. speaking), but that there are three independent groups the "matching" remains unclear. Is the singing group compared to the collapsed recital and conversation groups? What if singing is inferior in calling forth prosociality to one of the speaking groups but not the other?
Line 327: "Cooperation" is a long way from what is assessed in this experiment. The naming should be more cautious and proximate to what is assessed. For example, "self-reported attitudinal prosociality" is a better choice.
Lines 216-218: "participants will be asked to repeat the lyric recitation twice to ensure it takes a similar amount of time as the other conditions": There is a dilemma with regard to interpretability/internal validity that should be addressed: either the time is the same but number of repetitions are unequal, or the number of repetitions are the same but time is unequal. Best is to test either horn.
Since the content/manner of the conversation is not scripted and thus uncontrolled, the effect of conversation to call forth prosociality might highly depend on the spontaneously unfolding of content/manner of the conversation. For example, if humor comes up or particular insights come up this might be especially (un)evocative of prosociality.
I advise to not collect demographics 7-11 because they give away the research question or point participants to the emphasis on music. This being an experiment any demographics and auxiliary variables can and should be restricted to the minimum. Some of the other demographics/auxiliary items appear dispensable, as well.